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EDITORIAL INTRODUCTION 

In the spring of 1919 Doctor Holley, while acting as 
assistant director of the Bureau of Educational Research of the 
University of Illinois, tried out six group intelligence scales in 
the schools of Champaign, Illinois. A kind of survey, narrow 
but intensive in character, was thus afforded. The data, how- 
ever, with a little more analysis could be made to yield important 
results as to the reliabilty and validity of each of these six 
scales as instruments for measuring intelligence. With this 
thought in mind Doctor Holley carried out some of the necessary 
analyses and wrote the monograph which follows. 


Of the six tests, three have become popular in a large 
way. They are the “Otis Group Intelligence Scale,” the ‘“Primer 
Seale,” and the “Virginia Delta I’? (now known as the “Intelli- 
gence Examination, Delta 2”). Besides the six which were used 
in this investigation there were at least three others which might 
have been used. In all there appear to have been nine rather 
well-known tests at the time the survey at Champaign was 
started. 


Since then the number has been materially increased. 
Not only did several new tests come out during the school year, 
1919-1920, but at least three scales, complete in every essential 
detail, have been published this summer in anticipation of the 
“fall trade.” The World Book Company announces Terman’s 
“Group Test of Mental Ability”; Lippincott annouces the “Dear- 
born Group Tests of Intelligence”; and the Bureau of Educa- 
tional Research of the University of Illinois announces the 
“Tllinois General Intelligence Scale.” It is apparent that the 
movement to measure intelligence by means of group tests is 
well under way. 


Under these circumstances school people are inquiring 
somewhat anxiously, “Which among all the intelligence tests is 
best?” Like most general questions, this has no general an- 
swer. The “best” test is the one which is most appropriate. It 
may not be best at all times, with all pupils, and for all purposes. 
The term “best” therefore needs qualification. 


Nevertheless, no matter what the qualifications, there are 
certain characteristics which a good test—to say nothing of the 
best one—should possess. It should not require too much time 
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to administer. “It should be capable of rapid and objective 
rating. It should correlate highly, but not too highly, with 
teachers’ estimates of scholarship—say about 0.60. It should 
discriminate unmistakably between levels of intelligence which 
are known on other grounds to be different—e.g., the levels at 
different ages or grades. The subordinate exercises of which 
it is composed should test important mental traits and should 
contribute to the total score amounts proportional to the im- 
portance of these traits. Scores in the subordinate exercises 
should be relatively independent, for otherwise they merely tell 
the same story. Moreover, like the scales of which they form a 
part, they should discriminate between levels known to be dif- 
ferent. Both the entire scale and its subordinate exercises should 
yield very few zero scores and very few scores of the highest 
possible value. Indeed, there should not be many scores even in 
the region of these extremes. 


All of the scales in this investigation were examined 
with reference to these points. The method is of necessity 
largely statistical; but the outcome is practical enough. Certain 
very definite recommendations and suggestions are made. It 
is believed, therefore, that with reference to a few important 
tests the serious student when asking the question, ‘““Which test 
is best,” will find, if not a general answer, at least something 
fundamental and satisfactory. 


\ 


B. R. BUCKINGHAM, 


Director, Bureau of Educational Research, 
University of Illinois. 


August 26, 1920. 
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MENTAL TESTS FOR SCHOOL USE 


PART I—THE PRESENT STATUS OF MENTAL TESTING 


A—USES OF MENTAL TESTS 


I. The Recognition of Feeble-minded Children—Mental 
ability differs among individuals from very superior to very in- 
ferior with every possible gradation between these extremes. 
Degrees of difference in intelligence are usually unnoticed in a 
community because routine life does not reveal them to the gen- 
eral observer. We note only the grosser variations and these, 
as a rule, mainly when individuals are markedly defective. The 
village simpleton is a familiar figure. He is the roustabout who 
does light chores. He is the object of ridicule, the butt of the 
jokes of his more intelligent associates. Among the great mass 
of humanity, however, discriminations are not made; and when 
the question of competence is raised in the school or elsewhere, 
there is no satisfactory basis on which agreement may rest. 


In reality feeble-mindedness is present in nearly every 
community to a greater degree than has ever been recognized. 
The work of the Psychological Service of the United States Army 
leads one to believe that the number of people who, when ma- 
ture, do not exceed the mental development of the average nine- 
year-old child, is probably two or three out of each hundred of 
the population. Some authorities have placed the dividing line 
between feeble-mindedness and normality between the ages of 
ten and eleven or eleven and twelve. On this basis the feeble- 
minded would probably comprise from 5 to 10 percent of the 
the total population of the United States. 


Feeble-minded people are sadly limited in their ability to 
adjust themselves to social conditions. They are weak in their 
control of mind and body and difficult to teach. They have very 
poor memories and very poor discriminative power. Constant 
repetitions are required in order to teach them the simplest 
things. Many of them never learn to read or to spell or to do 
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simple arithmetic ‘problems even if they are kept in school for 
the entire compulsory education period. This applies especially 
to those whose mental ability does not exceed that of the average 
seven-year-old child. Those whose mental development is a lit- 
tle better than this may do something with the ordinary school- 
room tasks but the results are hardly proportionate to the cost 
in time and effort. The best that can be done for these people is 
to train them along manual lines. They can often be taught to 
do the ordinary home tasks of sweeping, dusting, washing 
dishes, peeling potatoes, bringing in coal and wood, mowing the 
lawn, chopping wood, and running errands of a simple nature. 
Even with constant supervision, there is little hope that those 
whose mental rating is below seven years can be made self-sup- 
porting. Many men of eight- and nine-year mental ability, how- 
ever, are getting along in the industrial world at the ordinary 
tasks which employ unskilled labor. They make a poor living 
to be sure, but they eke out an existence. 


The lower types of the feeble-minded have so little 
mental ability that they seldom engage in crime. Occasionally 
feeble-minded women even of low grade become social menaces 
but they do not usually take the aggressive part in their mis- 
demeanors. The higher grades of the feeble-minded, however, 
are a real social problem, for they are capable of participating 
in crimes of various sorts. Although comparatively few crimes 
are committed by real mental defectives, criminals actually 
exhibit every level of intelligence. Indeed they are more often 
characterized by moral than intellectual abnormality. On the 
other hand, people who are subnormal mentally are often model 
citizens, when social conduct is considered, because they have 
been trained to live correctly. Many criminals are defective in 
intelligence, but not all mental defectives are criminals even 
potentially. 


Mental tests are of value in detecting more accurately 
than personal judgment the different grades of feeble-minded- 
ness. The school may use the results of these tests in determin- 
ing those for whom ordinary school work is entirely unsuited. 
These pupils should be given school tasks, as far as possible, that 
are of the manual type, because this is for them the most hopeful 
field of training. Even in this work the same returns should 
not be expected that would be secured from normal children. It 
is wasteful to spend a markedly disproportionate amount of the 
school funds on this part of the population, though a portion of 


11 


the expense may be justified on the ground that the normal. 


children profit by the segregation of the defectives. 


| Il. The Recognition of Mentally Backward Children— 
Above the feeble-minded in mental development come those 
whom we call the mentally backward. These comprise from 10 
to 20 percent of the population, depending upon the criteria that 
are set up as the dividing lines between feeble-mindedness, back- 
wardness, and normal development. This is the class of our 
population from which, as a rule, petty criminals come. These 
are the people who are decidedly maladjusted under present con- 
ditions and who populate our slums and hovels. 


The backward learn slowly at school. They have poor 
memories, poor discriminative powers, and mediocre reasoning 
ability. If they are to be taught anything the process involves 
a large number of repetitions. As a rule, even when they have 
reached their physical maturity, they are still like children in 
many respects. They live in the present and care little and 
provide little for the future. In the schoolroom they are usually 
retarded; but they may have enough ability to do, in a mediocre 
way, the work of the grade in which they are classified if they 
are given extra attention. Teachers often fail to appreciate the 
difference between their chronological ages and those of their 
classmates, and, hence they fail also to detect their backward- 
ness. Developed along some lines these backward children have 
instincts and emotional reactions which are those of children of 
their own age. This side of their nature enables them at times 
to surprise the teacher with what seem to be bright responses ; 
and for this reason they are often rated higher in intelligence 
than they should be. 


‘The backward children in our schools should have special 
treatment. If put in classes by themselves they can be given 
the requisite repetitions of subject-matter; and they may thus 
learn at the rate of which they are capable. They need a special 
course of study built for their needs. When in the classroom 
with normal children they are continuously required to do things 
more quickly than their mental ability permits. As a conse- 
quence they fail, although if they were given more time they 
could succeed. They acquire the habit of failure, of which so 
much has been written. Mental tests would reveal the true 
situation and permit proper provisions to be made. 
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Ill. The Recognition of Normal Children who are 
Apparently Abnormal—Mental tests may also be used to detect 
normal children who are not making the progress of which they 
are capable. It occasionally happens that children who have 
average ability fail to keep up in their school work. In such 
cases it would be very profitable for the teacher to take the extra 
time needed to coach these children in order that they may pro- 
gress normally. Special attention given to normal children 
who have “lost out” for some reason or other, often pays very 
well—a point recognized by those superintendents who have 
organized “opportunity classes” to provide for them. Teachers 
of such classes, however, sometimes waste their energies on 
really defective children because the normal children have not 
been differentiated from them. If mental tests are to be used 
for this purpose they should be given along with tests in school 
subjects and, if a child is mentally normal according to the men- 
tal tests and retarded when judged by the school tests, it is 
obvious that extra attention given to his weakness will help to 
eliminate or at least to lessen it. Normal children are sometimes 
temperamental and fail to progress because they get ‘“‘at outs” 
with the teacher. Situations such as these may be revealed 
readily and the proper remedies may then be applied. 

IV. The Discovery of Superior Children—Mental tests 
have special value in the selection of superior children for 
special classes. These children may be just as much above the 
average as the backward and feeble-minded are below it. Many 
of them could do the ordinary work of the eight years in the 
elementary schools in one, two, or three years less time. There 
are two ways in which provision may be made for these children. 
One is by allowing them to skip grades now and then. This 
device is not to be recommended without qualifications. If a 
child skips a grade he misses some of the vital things at times 
and may be handicapped in this way. Often, however, it may 
be better for the markedly superior child to skip grades and 
thus reach his school level, than to move along in lockstep style. 

: The other way is to provide special classes for superior 
children. If a number of these children are detected by the 
administration of mental tests, they may be placed in a class 
by themselves. Under these conditions they will make much 
more rapid progress than they would in regular classes. Such 
a special class makes unusual demands on the teacher, and great 
care must, therefore, be taken in selecting the one who is to lead 
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a group of superior children. The teacher must be above the 
average in scholarship and be able at all times to keep up with 
the children in their thinking. In these special classes the work 
may be adjusted to the needs of the children. In some subjects 
they may be able to progress more rapidly than the average 
child. In others, the emphasis probably should be placed on 
supplementing the work, thus making it richer in content. 


If superior children are _kept in classes with normal 
children they often acquire bad habits. They are kept marking 
time at a point far below their possible working efficiency and, 
thus, acquire all the undesirable characteristics of mischievous 
children. Habits of idleness, disorder, and general inefficiency 
are often the result of this maladjustment. On the other hand, 
superior children should not be pushed too rapidly in school 
work as is often done when they receive extra promotions. If 
they are sent along at too rapid a pace they often reach levels 
where their mental ability is not equal to some of the tasks 
which are set before them. This is because the subject-matter 
has been graded to meet the needs of normal children whose 
emotional lives have matured in a definite relationship with 
their mental lives. Superior children with their unusual mental 
development are often merely normal in their emotional lives, 
having for example fourteen-year-old minds in ten-year-old 
bodies with ten-year-old emotions.. When a superior child is ex- 
pected to feel and think in the same terms as a child several 
years his chronological senior he is often unable to do so. This 
situation implies that if special classes become common, it may 
be necessary to modify the subject-matter used in classes for the 
gifted so that it may be fitted to them. 

One argument often made against the rapid promotion 
of superior children is that they are soon thrown into compan- 
ionship with older children. This criticism is a serious one. It 
is, however, anticipated by the provision for special classes 
advocated—a provision which groups a number of these super- 
ior children together. Where it is impossible to form classes for 
superior children, as will usually be the case in small school sys- 
tems, one should consider the situation carefully before making 
extra promotions. Yet a superior child will sometimes reach a 
place where there is almost nothing for him to do in the grade 
in which he is placed. A child without something to do is a 
menace to himself. Under these circumstances, it may be the 
plain duty of the school to promote him. 
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Vv. The Grading of Children for General Promotion— 
Another use that may be made of mental tests is to reveal in- 
dividual differences as a basis for grading and promotion. It is 
-often necessary to regrade children who are changing schools. 
Under these conditions one cannot rely upon their marks be- 
cause, coming from different schools and from different teach- 
ers, the children have been rated according to different stand- 
ards. A good mental test will enable one to regrade the children 
in a fairly satisfactory way. These classifications can then he 
compared with the scholarship achievements of the pupils dur- 
ing the first month of the year and minor adjustments may be 
made. When pupils pass from one type of school to another— 
as from elementary to high school, or from high school to college 
—the application of mental tests as a basis of judging fitness 
for entrance and of sectioning is important. It is probable that 
the near future will see an extensive use of mental tests as a 
means of determining fitness to enter new schools. 


The use of mental tests is particularly appropriate in the 
junior high school where sections are often formed on the basis 
of mental ability. In systems where a grade has four or five 
sections organized on this basis, it has been found that the best 
sections often do twice as much work as the poorest. 


VI. The Determination of General and Special Ability 
for Educational and Vocational Guidance—The near future will 
also probably witness the extended use of mental tests in an- 
other field—the field of vocational and educational guidance. 
Under present conditions there are few tests which can be 
recommended even in a limited way as suitable for this work. — 
This situation, however, is likely to be temporary. We are mak- 
ing rapid strides in the preparation of mental tests. It is prob- 
able that the year 1920 will see the publication and the stand- 
ardization of a number of mental tests both general and special. 
Some of these, no doubt, will be suitable for this work.’ 


There are two phases of this problem. It may be at- 
tacked from the point of view of so-called general intelligence. 
A certain degree of mental ability is necessary for the successful 
negotiation of most tasks. The amount of such ability can be 
determined in a fairly accurate way for each vocation. Individ- 
uals who do not in this respect measure up to the minimum re- 


*This paragraph was written in the fall of 1919. Th icti 
to have been justified. 8 eee ne ee 
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quirement in an occupation will do well not to attempt to enter 
it. The same is true with respect to higher education. The tests 
that have been given thus far seem to show that unless a person 
has the necessary fundamental basis it is undesirable for him 
to attempt to secure a higher education. At present we are ap- 
plying crude methods of selection to nearly all of these activities. 
Oftentimes it may be merely vague personal opinions or chance 
peculiarities which form the basis of the judgment of the 
“expert.” ; . 


The other phase of this problem concerns the specific 
abilities which are needed in special lines of work. A few tests 
have been devised which attempt to pick out the mental pecul- 
iarities of people who are successful in music, art, or other spe- 
cific lines. We are making a beginning in this field and probably 
will make rapid progress from now on. Enough has been done 
in industry to indicate also that different occupations make their 
special demands. These specific requirements can be determined 
and individuals, who are not equipped with the peculiar capaci- 
ties needed, can be rejected by the employment office. Thus, for 
example, one occupation may demand clear vision, another quick 
perception, and a third delicate motor adjustments. The degree 
to which each of these traits must be present to avoid probable 
failure may be established, and individuals not meeting the re- 
quirements for the occupation in question may he diverted 
from it. 


The problems of vocational and educational guidance are 
much more complex than the problems of the employment man- 
ager. The expert in vocational or educational guidance is ex- 
pected to make a wise recommendation for every individual who 
comes ‘up for an analysis. The employment manager, on the 
other hand, usually has a number of people from whom he is 
privileged to select the best. This makes it possible for the tests 
used by the employment manager to have an element of error 
in them that would be fatal to the success of the test used by the 
counselor of individuals. The latter is most concerned with the 
future possibilities of the individual. Will the boy or girl who 
is receiving advice develop with further education in a way 
that will make his or her adjustment to the required conditions’ 
easy? The future must be considered to the extent of five or 
ten years. On the other hand, the employment manager is con- 
cerned with the immediate present. Only. rarely will he con- 
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sider the possibilities which may be attained by the individual 
five or ten years hence. 


B—WHAT MENTAL TESTS MEASURE 


I. Phases of Mentality which are Measured—In the 
popular mind there is much confusion as to just what mental 
tests measure. In general the thing sought to be measured by 
present mental tests is potential adaptability to conditions. How 
readily can the subject adjust himself to new situations? How 
quickly can he learn? To what extent can he profit by ex- 
perience? It is a question of potential ability whether it arises 
from inherited native capacity or not. 


From a more literal point of view, mental tests may be 
said to measure only the individual’s performance. With his 
performance as a hypothesis we infer his ability. How near we 
come to the truth will depend upon how closely what he does cor- 
responds to what he can do. In some cases the inference will not 
do the individual justice because he has not done his best—per- 
haps not nearly his best. But the standardization of procedure 
in giving tests and their repetition on different occasions with 
the same individuals will greatly reduce the likelihood of error in 
inferring ability from performance. 


Moreover, we draw similar conclusions in regard to 
human behavior of all sorts. In other words, we infer ability 
from its outward manifestation in performance. A salesman’s 
ability is gauged by the amount of his sales, a mechanic’s abil- 
ity by his visible product. The writer is judged by his books, the 
preacher by his sermons, the physician by his cures, and the 
business man by his holdings. In a world of action ability which 
does not eventuate in action is as if it were not. 


Yet potential action—i.e. ability—must ever be in ad- 
vance of actual performance. The margin between what can be 
done and what under given conditions is done varies between 
individuals and for the same individual at different times. When 
the conditions are favorable the margin is contracted and per- 
formance approaches the level of ability. Under unfavorable 
conditions performance may lag far behind ability. 


How wide the habitual margin is for a particular individ- 
ual is of little consequence. He may plead greater ability than 
he shows, but we shall continue to discount it to the level of his 
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customary performance. Indeed, we may be theoretically as 
well as practically correct in so doing. It may very probably be 
true that a person, perhaps through emotional or volitional de- 
fect, exhibits a characteristic discrepancy between intellectual 
ability and action—a trait which is as peculiarly his own as his 
blue eyes or his aquiline nose, a trait in virtue of which an un- 
usually large amount of his mentality cannot be: brought into 
play. We are aware that in speaking of “amount of mentality” 
in this connection we are using a crude expression. It is only as 
the mentality permits action that we can speak of its “amount.” 
Of what is over and above that which functions, we know noth- 
ing. It may be much or little, but since it accomplishes nothing 
further than to provide a working margin, we may safely neglect 
it in our tests of intelligence. 


But the variation in this margin for the same individual 
at different times is serious. Unless we can allow for this 
variation or reduce it to a negligible amount, the reliability of 
our results is seriously impaired. A great deal of effort has, 
therefore, been expended in order that our results in terms of 
performance may become a usable index of mental ability. It 
is evident that the causes of unreliability—of the variability of 
this margin of which we have been speaking—lie in the chang- 
ing conditions under which performance takes place. 


These conditions are both external and internal. The ex- 
ternal conditions include those of temperature, ventilation, 
illumination, and in general all the things which may at the time 
be present to the senses. In mental testing a set of especially 
important external factors has to do with the examiner. His 
directions may be clear or faulty, may give too much or too little 
information, may give a right or wrong “mental set.” His voice 
may be entirely or but partly audible, harsh, or pleasing. His 
manner may be stimulating or depressing. Effort is made to 
reduce the variation due to these external conditions by stand- 
ardizing them. This is especially true with regard to the exam- 
iner and the directions which he is to give. Some of the other 
external conditions—e.g. such variations in ventilation as are 
commonly found in schoolrooms—do not appear to make appre- 
ciable differences in performance. On the whole we believe that 
variations due to external conditions have been reasonably con- 
trolled where carefully devised tests have been properly used. 
Greater control is possible especially through more adequate 
training of examiners; and through the derivation of tests which 
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require less special training on the part of the examiner. Pro- 
gress is being made in both these directions, 

Internal conditions under which performance takes place 
are only partly controlable. In the schoolroom we may, how- 
ever, do more of this than is at first apparent. For example, the 
element of fatigue may be measurably controlled by a uniform 
schedule of work prior to the time of testing. On the other 
hand, remoter factors having to do with the condition in which 
children come to school are less easily controlled if indeed they 
can be controlled at all. 

Unreliability due to variation in both external and inter- 
nal conditions may also be reduced by repeating tests, by giving 
parallel tests, and by giving several different tests—in short by 
securing at different times additional data regarding the intel- 
ligence of the examinee. The extent to which this should be done 
in order to secure results of a given reliability is one of the 
promising statistical fields in which workers are now engaged 
but in which they have not, as yet, secured usable results. Mean- 
while, however, it is evident that the reliability of a first determi- 
nation of the mentality of an individual is greatly increased 
when no more than a single additional and independent determ- 
ination is found to agree with it. Further determinations, if 
they are still substantially in agreement, will establish a degree 
of probability amounting to practical certainty. If determina- 
tions are not in reasonably close agreement, they may properly 
be regarded as chance variations from a presumably truer de- 
termination. The average of the ascertained determinations 
may be taken as the best representation of this truer determi- 
nation. 

In any event, therefore, inferring ability from perform- 
ance is no new procedure. With care in administering mental 
tests, it is probable that we may make such inferences in refer- 
ence to intelligence with reasonable accuracy. 


Mental tests do not measure native capacity or general 
intelligence directly. They only indirectly get at these as they 
have been modified by experience. Even tests which are com- 
posed of the most perfect uncoachable elements are attempted 
more successfully by those who have had a thorough education 
than by those who have never been inside of a schoolroom. It is 
conceivable that tests may be devised which will measure pure 
intelligence—i.e. native capacity; but present tests are not of 
this nature, and it is questionable whether such tests are desira- 
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ble. The individuals who are being measured are not the same 
individuals they would have been, if they had had different ex- 
periences. The important thing from all practical points of view 
is the present status of the individual. Theoretically, it may be 
interesting to compare two people on the basis of their pure: 
native capacity, but for most practical purposes this question is 
unimportant. Experience plus native capacity has made the 
present individual, and it is he who must be considered. 


Il. Phases of Mentality which are not Measured—Men- 
tal tests do not measure the emotional side of life. They do not 
test one’s ability to feel or to appreciate the finer things in art, 
nor do they test one’s feeling of respect for one’s fellows. They 
do not measure the ability to persevere, or to “carry on,” except 
in a very limited way. Many people with mediocre endurance 
have sufficient power to enable them to work at a high pitch dur- 
ing the brief interval of a mental test, but they would be entirely 
unable to work twelve hours at a stretch, day after day. Mental 
tests do not measure the motives which guide the conduct of an 
individual—his conscience, his ideals, his honesty, and dependa- 
bility. 

This point of view, however, does not take into account 
the fact that these so-called emotional characteristics are appar- 
ently correlated in general with measurable mental character- 
istics. Usually the most brilliant individual from the mental 
point of view also has a very large endowment in ideals, endur- 
ance, persistence, and appreciation. In so far as the mental and 
emotional characteristics of human nature are correlated, tests 
of mental ability are also tests of emotional ability. The excep- 
tions, however, are responsible for much of the criticism which 
is directed toward mental tests. 


Again, mental tests do not measure directly the ability to 
use habits which have béen acquired. Comparatively low-grade 
individuals may learn to do things which are mainly habitual 
activities. Through much practice they may have been perfect- 
ed in the habits involved and, once having learned the habits, 
they may be able to practice them as effectively as the average 
individual. Consequently, mental tests so far as they test the per- 
formance of acquired habits may not be discriminative. Special 
tests are needed for this purpose. Mental tests, however, will 
indicate to a certain degree the speed with which individuals 
may acquire new habits and the facility with which they may 
modify old habits in new situations. 
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) C—TYPES OF MENTAL TESTS 


I. Individual Scales. 


a. Characteristics of individual scales—The term “in- 
dividual scales” is applied to those major measuring instruments 
which are used to test individuals one at a time. Such scales 
are sometimes called interview tests. They are composed of 
many items of very diverse character. These different bits of 
test material aim to determine the stage of development in the 
different functions involved in mental ability.. The items differ 
in difficulty from very easy to very hard. The difficulty of each 
item is known with a reasonable degree of definiteness and the 
response to all the questions is combined in one value This value 
is commonly expressed in the form of a mental age. 


In general, individual tests are regarded as our most ac- 
curate instruments. They have their limitations, however. If a 
pupil is sick or is unusually bashful, or becomes angry, the re- 
sults are not descriptive of his real ability. Anything that 
prevents full cooperation with the examiner will invalidate the 
results. The different scales which have been devised for individ- 
ual use have their own special limitations. A scale which is in- 
tended to measure children only between the ages of seven and 
fourteen should not be used in testing the ability of people 
whose mental age runs above these years. Further, a test which 
is merely a test of performance may not be a test of linguistic 
or other types of ability. 


All the individual scales now in use require carefully 
trained examiners. Each item in the test will result in accurate 
information only after a careful following of directions and an 
accurate evaluation of the responses. <A slight deviation from 
the standard wording of the directions will materially alter the 
response. Leniency or severity in the scoring of responses will 
influence the conclusions. Each answer must be evaluated 
accurately in the same manner that it was evaluated when the 
scale was devised. This can be done only by those who have 
made a thorough study of the scales and have some knowledge 
of child psychology. Some people are by temperament entirely 
unfitted to give individual examinations. Being unable to secure 
the cooperation of the subject, they obtain erroneous results: 
Thus an error of as much as two or three years may be made in 
the determination of the mental age of the subject. 


- a 
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The greatest objection, however, to the individual scales is 
that they are time-consuming. It takes from thirty minutes to 
two hours to administer either of the three scales described 
below to a single child. This fact makes individual methods of 
testing so expensive that they can never come into general use 


-in the schools. 


b. Available individual scales—1. “The Point Scale” 
by Yerkes, Bridges, and Hardwick. The manual describing this 
scale is published by Warwick and York, Baltimore, Maryland. 
The scale is composed of 20 different tests, and the total possible 
score is 100 points. Within each test the items are graded 
somewhat in difficulty. Some of the tests are much easier than 
others but there is no careful gradation of the tests from very 
easy to very difficult. When this scale was devised it was in- 
tended to be valid between the mental ages of seven and four- 
teen. It has been found, however, in practice that if the scale 
is used with adults the results are questionable when the mental 
age exceeds twelve. The Point Scale may be given in less time 
than the two following scales, but it is a comparatively inflexible 
instrument. It deals primarily with literary material and places 
the unschooled individual at a decided disadvantage. The tech- 
nique of its administration is somewhat difficult and no one 
should attempt to use it unless he has made a careful study of 
the manual and has been supervised in administering it. In 
other words, it cannot be given by an untrained examiner. The 
Point Scale is a modification of the early Binet-Simon Scale with 
the addition of a few new elements. 


2. “The Stanford Revision of the Binet-Simon Scale” 
by Lewis M. Terman. The manual describing this scale is en- 
titled ‘‘The Measurement of Intelligence” by Lewis M. Terman, 
published by the Houghton Mifflin Co. The envelope of test ma- 
terials needed in administering the scale is furnished by the 
same publishers. This scale, as its name implies, is a revision 
and extension of the Binet-Simon. Scale. It is composed of 90 
different tests arranged in 12 groups corresponding to mental 
levels of from three to eighteen years. The large number of tests 
permits at least six of them to be included in each age group, 
thus securing a comparatively high reliability. The length of 
the scale, however, increases the time needed for its administra- 
tion. Few workers who do thorough testing of children take less 


2The materials needed for the administration of the three individual scales 
described are sold by C. H. Stoelting Co., 3047 Carroll Avenue, Chicago, III. 
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than an hour for each child when using this scale. In some cases 
it is necessary to use an hour and a half or two hours to complete 
the test according to instructions. 


The scale is accompanied by very carefully prepared 
directions. Since it is composed of so many items, it can be ad- 
ministered successfully only by those who have had a thorough 
training. In comparison with the Point Scale it requires two or 
three times as much effort to learn to give the Stanford Revision. 
The results secured, however, are usually considered to be more 
significant. Individuals who have not gone to school, however, 
are penalized by the literary character of many of the tests 
and do not do themselves justice. 


3. “A Scale of Performance Tests’ by Rudolf Pintner 
and Donald G. Paterson, published by D. Appleton and Co., New 
York. The performance scale as devised by these authors has 
proved to be useful for measuring the mental ability of illiterates 
and foreigners. In the Psychological Service of the United 
States Army a modification of this scale was used with those 
men who could not be tested with the Point Scale or the Stanford 
Revision. As presented in this book, the scale is somewhat poor- 
ly adapted for school use. Modifications can be made, however, 
which will make it helpful in those situations where literary 
material cannot be used for test purposes. 


II. Group Scales. 


a. Characteristics—There are already a number of 
group scales for measuring the mental ability of children and 
adults. These are made up of several graded tests each of which 
is composed of individual items which are comparatively homo- 
geneous. The theory underlying these tests is that several tests 
measuring different mental functions will measure general in- 
telligence when the results are pooled. As at present arranged, 
it is felt that these group scales are not as accurate in their 
measurements as individual scales. In all probability, however, 
as high a degree of accuracy can be secured from the use of a 
number of group scales as from a single individual scale. This 
point has not been definitely settled at this time, however, and 
additional evidence is needed to guide us properly. Where a 
comparatively rough estimate of the mental ability of people is 
desired, these group scales answer the purpose very well. They 
have the decided advantage over the individual scales that they 
do not require much time per subject for their administration. 
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Moreover, most of them are not as complicated as the individual 
scales, and they can, therefore, be administered by intelligent 
people who have had comparatively little training. The points 
to be remembered in the administration of any one of the group 
intelligence scales are seldom as numerous as the instructions 
for two or three single tests in the Stanford Revision. It is very 
probable that group scales will supplant individual scales for 
general purposes. They will indicate those individuals who 
deviate from the norm and then, if the results need confirmation, 
it will be possible to give individual tests or to make additional 
studies of these unusual individuals. 


6. Existing group scales—Nine important group in- 
telligence scales have lately come to our attention. Doubtless 
there are others; for during the past six months a number of 
psychologists have been busy developing group scales. Under 
the auspices of the Bureau of Educational Research, six of 
these nine scales have been tried out during the past year. De- 
tailed results will be given in Part II of this report. The six 
scales are the following: 


1. “Otis Group Intelligence Scale,” devised by Dr. 
Arthur §S. Otis; published by the World Book Company. 


2. “Classification Test,” devised by Dr. W. W. Theisen 
and Mrs. Cecile White Flemming. Announced for publication by 
Teachers College, Columbia University. 

3. “Group Test for Grammar Grades,” devised by Pro- 
fessor Guy M. Whipple; published by the Public School Publish- 
ing Co., Bloomington, Illinois. 

4. ‘Primer Scale,” devised and published by Mrs. Luella 
W. Pressey, Indiana University, Bloomington, Indiana. 

5. “Virginia Delta I,” devised by Professor M. E. Hag- 
gerty for the Virginia Educational Commission; published by 
the World Book Company under the name Intelligence Examina- 
tion, Delta 2. 

6. “Sentence Vocabulary Scale,” devised by the writer; 
published by the Bureau of Educational Research, University of 
Illinois, Urbana, Illinois. 

The remaining three of the nine group scales to which 
we have referred were not tried out. They are briefly described 


* Spring of 1919. 
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below. The first had been used earlier in the year at Danville 
and will be reported elsewhere. The second was for more ad- 
vanced pupils than we were testing, and the third became avail- 
able too late for use. 


1. “Indiana Group Point Scale,” devised and published 
by Sidney L. Pressey, Indiana University, Bloomington, Indiana. 
This group scale was one of the first to be published. Asa 
pioneer scale it deserves no small credit, but it contains defects 
in administration which will prevent it from becoming popular 
in its present form. It is long and very exacting on the examin- 
er; and the scoring is somewhat difficult. It cannot be given 
by teachers with success unless they have been carefully trained 
in its administration. The units of the scale are somewhat 
coarse and its discrimination is not very accurate. There are 
ten tests, each containing 20 items which are supposed to meas- 
ure ability from the third grade through the high school. 


2. “Psychological Examination for College Freshmen 
and High School Seniors, Parts A-and B,”’ devised and published 
by L. L. Thurstone, Carnegie Institute of Technology, Pitts- 
burgh, Pennsylvania. This group scale is arranged in what is 
known as the “Omnibus Form.” Its administration is exceed- 
ingly simple since the examiner has almost nothing to do except 
to start and stop those taking the test. The blanks contain com- 
plete directions. Little can be said by the writer as to the value 
of this scale. The materials used are approximately the same 
as those used in the Alpha Army Test. There is little doubt but 
that they are difficult enough for the groups of students (col- 
lege freshmen and high-school seniors) for whom they have been 
devised. 


3- “Virginia Delta VII for Grades I to III,’ devised by 
Professor M. E. Haggerty for the Virginia Education Commis- 
sion; published by the World Book Company under the name 
Intelligence Examination, Delta I. This scale is one of the 
latest that has come to the attention of the writer. It seems to 
offer possibilities which will make it valuable for the primary 
grades. Nothing further can be said about it at this time since 
no published data are available.’ 


* Since this was written, standards have been provided. They are furnished 
when the tests are purchased. 


it pet a 
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PART II—COMPARISON OF GROUP MENTAL SCALES 
A—INTRODUCTION 


The rapid development of group scales has always been 
referred to. Some were planned before the United States en- 
tered the World War, and have been gradually developed since 
then. Others originated in connection with the work of the 
Psychological Service of the United States Army. Thus a num- 
ber of group scales have become available for school use without 
much knowledge of their appropriateness for such use. Instead 
of considering each of these instruments from an a@ priori point 
of view, we have preferred to administer them to public school 
children under school conditions and to draw conclusions from 
the facts as thus revealed. 


The opportunity to do this presented itself in connection 
with the work of the Bureau of Educational Research during 
the second semester of 1918-1919. With the cooperation of teach- 
ers and supervisors, six scales were administered to the school 
children of Champaign, Illinois, in the elementary and high 
school. The following scales were used: (1) Otis Group Intel- 
ligence Scale; (2) Classification Test, Form A; (3) Whipple’s 
Group Tests for Grammar Grades; (4) Pressey Primer Scale; 
(5) General Examination No. 1—Virginia Delta I, and (6) 
Sentence Vocabulary Scales. 


B—ADMINISTRATION OF TESTS 


Approximately twenty-five hundred children were tested 
with one or more scales. With the exception of the Sentence 
Vocabulary Scales, which were administered by the individual 
teachers, all of the scales were given by the writer or by super- 
visory teachers or trained workers. Consequently, all of the 
data, with this single exception, may be considered to have been 
secured by disinterested people who could be relied upon to ad- 
minister the scales according to instructions. 


J. The Otis Scale—Due to physical limitations the ad- 
ministration of the Otis Scale was restricted to those grades in 
which it was thought that it could be given with greatest suc- 
cess. Furthermore, it seemed best not to attempt to give it to all 
of the schools in the city. Consequently it was offered only to 
grades VI to xIl inclusive. In grades vi to vil the children 
in one average school alone were examined. In the eighth grade 
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and in the high school the children were selected at random and 
were probably representative of these grades for the city. The 
first administration of the Otis Scale was in the eighth grade. 
The reaction of the children seemed to indicate that the scale 
would not be suitable for the lower intermediate grades. How- 
ever, it was decided actually to give the test in the sixth and 
seventh grades to learn positively whether or not it was suitabie 
for these grades. A general idea of the results obtained from 
administering the Otis Scale may be gathered from the central 
tendencies and variabilities for each grade as shown in Table 
I. The maximum score for this scale is 230. : 


TABLE I. TOTAL SCORES IN THE OTIS GROUP INTELLIGENCE SCALE 


| GRADE 
VI | Vile) avi | ix. | x | o.€ | XII 
No. of pupils Mama A Gece sl at lly pe ear 3 54 


Average B68 i 1t3 pF doh) ede Ueda eG 
Standard deviation | 15.5 | 28.5 | 30 | 95.51 2551 24 
Median | O4. 1ST oo 12a Lose We fag 144 
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Il. Classification Test—The number of pupils who could 
be tested by this scale was restricted by almost the same factors 
which limited the administration of the preceding scale. It 
seemed best not to test the same children as were tested with the 
Otis Scale. From the standpoint of the statistical study ot the 
scales, it would have been desirable to measure identical children 
with them, but from the standpoint of the school system, it was 
felt to be better to test different children with each scale in order 
that a wider survey might result. In every case, however, at 
least one of the other four scales was given to each group of 
children examined by the Otis and Classification scales. It ap- 
peared likely that the Classification Test could not be used to ad- 
vantage below the fifth grade. Accordingly, it was administered 
in grades V to XII inclusive, with the results indicated in Table 
II. In Table II are also given data obtained by Dr. Theisen 
from several Wisconsin communities. 


TABLE II. TOTAL SCORES IN THE THEISEN-FLEM MING CLASSIFICA- 


TION TEST 
GRADES 
Ver vie vilepvillimixe) “ey Xieexit 
Champaign, I eS led ie tie —|— 
No. of pupils Sl GS 62 bb 61) |< 24 33 31 
Average 60 | 84 98 | 108 |118 |125 (134 141 
Standard deviation }18.5; 16 | 18.5] 16.5) 22 | 27.5) 22 26 
Median (S9eecom Oo LOT T6 Sol SS 6a lay 
Wisconsin 4 | | | 
No, of pupils | 1142 {101 /|608 /|289 {118 | 262 
Average 75 90 {108 {115 j|112 {128 
Standard deviation | 22 | 21 24.5) 29.5 20.5 24.5 
Median Poe t4 f BO 109 [114 112 7122 


4 Unpublished material furnished through the courtesy of Dr. W. W. Theisen. 
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Ill.. Whipple’s Group Test—It did not take very much 
work with this scale to show its general administrative inferior- 
ity in its present form to the other scales which were studied. It 
was found that much time was needed to give it and that a num- 
ber of things about the scoring make it unsatisfactory from 
that point of view. The edition of the scale used at Champaign 
was the first offered by Dr. Whipple. It, therefore, contained 
defects which have been eliminated in later editions. For ex- 
ample, Test 5 was not printed in a form that was intelligible to 
the children, and it could not be used in this study. Moreover, 
there was no authorized procedure by which the scores in the 
different tests could be converted into a total score comparable 
to the total scores of other scales. It was administered to but 
145 children. They were in grades IV to VI inclusive. 


IV. Pressey’s Primer Scale—The Primer Scale was ad- 
ministered throughout the city in grades I to III inclusive. The 
distribution of the scores is shown in Table III. 


TABLE III. DISTRIBUTION OF TOTAL SCORES IN PRESSEY’S PRIMER 
SCALE BY GRADES 


SCORE : = S 
I II iil 
04 | 1 
5—9 | 2 

10—14 | ey | 

15—19 9 | 4 

20—24 8 1 

25—29 8 3 af 

S034 9 3 2 

30—39 16 7 5 

40—44 oO 11 8 

45—49 18 yaya 8 

50—54 32 29 ates 

5db—5b9 iss ape Dit 

60—64 ites 31 32 

65—69 6 33 ell 

10—T4 8 HAL 

75—T19 af 23 

80—84 ee 10 

85—89 | iS 

90—94 | | if 
Total 3 : 5 le AKO, 183 ie S9 
Average : ee 43.3 56.4 64,1 
Standard deviation 14 1235 12 
Median ; ; : 44 | 57 64 


if 
1% 
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V. Virginia Delta I—The materials for this test were 
furnished by Dr. M. E. Haggerty, Director of the Division of 
Tests and Measurements, Virginia Education Commission. That 
norms with which to compare the results of this test, which was 
being given in Virginia, might be available 1,200 copies were 
supplied to the Bureau of Educational Research. ‘These proved 
sufficient to test all the children of Champaign, Illinois, in grades 
III to VIII inclusive. The results are shown in Table IV. 


TABLE IV. TOTAL SCORES IN VIRGINA DELTA I. 


Grade 
SCORE - ia : 
5 gl GME | IV V VI VII Vill 
0O— 9 | i 
10— 19° Be 
20— 29 11 
30— 39 2A ae | 2 
40— 49 Path 22 
50— 59 27 30" i) 19 1 
60— 69 2, 45 30 8 3 1 
70— 79 7 AT ia dea cae 20 6 0 
80— 89 if 23m ai oO 25 11 4 
90— 99 2 9 | 84 32 | Bye 16 
100—109 7 17 31 | 29 38 
110—119 i | 8 28 era.) 1 ake! 
120—129 ib | 4 10 | 28 | 388 
1380—-139 0 3 | Sake 38 
140—149 a dl a as 
150—159 | ia 1 
160—169 | | | | a 
Motal ccame yw eeie e WS 187 | 201 159 167 | 180 
Average . - F | 48 69.1 82.7 102.7 iUaTibAS) jy TTA 
Standard deviation eG TG aes LG. orale kG. iWyiaay Wi AGS 4 


Medianwe sss. | 48 Osan i OO hes IO 1 alae 
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VI. . Sentence Vocabulary Scales. 


a. Origin—The high degree of reliability of the vocab- 
ulary test contained in the Stanford Revision of the Binet-Simon 
Scale suggested that this material, if arranged as a group test, 
might prove valuable. Accordingly, sentences were devised each 
of which contained one of the words from the vocabulary lists 
of the Stanford Revision. The last word of each sentence was 
one of four words placed some distance to the right of the body 
of the sentence. The pupil taking the test was directed to un- 
derline one of the four words in each line which completed the 
sentence satisfactorily. The sentences were divided into two 
groups of fifty each on the same basis that Dr. Terman used in 
dividing his list. These two groups were called Series G and 
Series H. When thus arranged the sentences were mimeo- 
graphed and were administered to several classes of children 
without a time limit. This preliminary use of the material re- 
vealed merit, and it was revised to remove obvious crudities of 
construction. The two series were then given to all the children 
in grades III to XII inclusive of the Champaign public schools. 
The papers were scored by deducting from the number of cor- 
rectly underscored words one-third the number underscored in- 
correctly. This was done to reduce the effect of chance. Where 
a child underscored more than one word in a line, the sentence 
was counted as omitted. 


b. Results—The analysis of the results soon revealed a 
wide deviation for individual pupils between the scores made 
in Series G and in Series H, although the median and average 
scores for the two series were about the same in a given grade. 
These differences were large enough to reduce the correlations 
between individual scores in the two series to surprisingly low 


values. In no grade was the correlation over +0.58. (See 
Table XXVI.) 


When the results of two tests of the same kind show as 
much deviation for the individuals as this, the obvious thing to 
do is to combine the two scores into single indices. This was 
done, and the distribution of the total scores is presented in Table 
V. The deviations found between grade scores in Series G and in 
Series H led to a rearrangement of the two series. The total 
number of times that each of the one hundred sentences was 
completed correctly was computed for each grade. These re- 
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sults were reduced to percents, using as a base the total number 
of children who took the test in the grade in question. This pro- 
cedure counts errors and omissions the same. With the result- 
ing percents as a basis, the sentences were rearranged and are 
now presented as Series I and Series II. It is probable that 
these lists are as nearly of equal difficulty for the different 
grades as statistical computations can insure. 


TABLE V. DISTRIBUTION OF TOTAL SCORES IN THE SENTENCE 
VOCABULARY SCALE 


| TOTAL SCORE FOR GRADE 

SCORE — ws Foss 

IIt IV Wis se gyal VII | VIII | Xe eX xl XII 

| jo 4 : an: ee | 

5— 9 18 io wT | 

10—14 19 Ae ee 

15—19 | 27 10 et 2 1 

20—24 20 21 20 3 2 

22 Ody | Sie ||, eye 19 2; 1 1 

30-34 | 6 | 46 | 40 | 37 | 16 5 0 

35—39. | 7 21 eS O 28 24 abil 6 3 | 

A ae a 2a 56 31 rax0) i) als} 4 MN al 

45—49 S| lems 13 Se SOM 2k | 19 8 6 4 

50—54 | ber) | 7 18 24 | 24 | 20 16 8 t5) 

55—59 Le me 9 19 SYA | 43} Pa | PAD |) als 

60—64 1 LAP TSU IOS fel Se a Ste 2. tm 

65—69 | 2 yl) AKG ile | PA || BS ee! 

70—74 | (3 5 ibs} tala uye ee 

io) | | 3 Ones 10 7 

80—84 | | | 1 hag 3 7 

85—89 | | | 1 9 ® 3 3 

90—94 2 1 0 
Total LS0MLOV ELS Oneal te 1 OSem lO mal LoOre Luo ls 93 
Average I aerial 248) 38.5 | 41.5] 47.1}; 52.1) 57.2| 60.7) 64.2) 66.4 
Standard- | | | 

deviation | 10.5 9.0 9.5} 9.5) 10.5 10.5 | 10.5; 10.5) 10.5 OE 
Median ead él 33 40 Ai ooe We OG OL 64 66 


i ee SS SE SS 
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Table VI shows the percent of pupils in each grade who 
responded correctly to each sentence. The percents are larger 
in most cases than they would be, if a deduction had been made 
for the number of times a sentence was underlined correctly 
by pure chance. Such a deduction was not made. If the chance 
factor were not present to inflate the percents, there would be 
an appreciable percent of pupils having zero scores. If every 
child had tried each sentence, the lowest percent theoretically 
would have been 25. It might be said in the light of the percents 
in this table that ‘a little knowledge is a dangerous thing” be- 
cause the more mature high-school pupils, who tried to get some 
of the words by comparison of form and derivative roots, made 
more errors than the grade pupils who underlined purely at 
random. 


One of the chief merits of the sentence vocabulary scale 
is the ease with which duplicate forms can be devised. The 
original 100 words contained in the Terman Vocabulary List 
were chosen by a random sampling method from the 1904 edition 
of Laird and Lee’s Vest Pocket Dictionary. Other lists of 100 
words can be selected by choosing words equally distant in the 
dictionary from those selected by Terman. The writer has al- 
ready chosen the first and second words preceding Terman’s and 
it is planned to present these in sentences at the first opportun- 
ity. Care should be taken in the derivation of duplicate forms 
to select the words in the sentences in such a manner that the 
ideas represented by the four completing words are approxi- 
mately of the same degree of abstractness as the key-word in 
the sentence. 


——— 
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TABLE VI. PERCENT OF PUPILS MARKING VOCABULARY SENTENCES 


CORRECTLY 
SERIES [ 
aR TA 0 
No. oF | GRADE 
> es pee errs orn em RI et 
SENTENCE HTP eV) OV yl evil Viti) Tx’) xxix! 
1 85 | 938 | 97 | 96 | 98 | 95 | 97-/100 | 98 | 100 
2 ; 81 | 98 | 98 | 99 |100 | 938 | 99 | 97 | 99 |100 
3 | 80 | 93 | 95 | 96 |100 | 96 | 98 | 95 | 98 | 99 
4 19 282° 1°85" | 90 1-87 | -86;| 90:| 94 | 88 | 85 
5 12 | .91 |. 92\-97 | 97 | 96°1299 1100 |100 | 100 
6 | 68 | 92 | 88 | 91 | 96 | 98.| 98 | 94 | 94 | 99 
7 | 62 | 92} 92 | 95 | 95 | 99 |100 |100 |100 |100 
8 |} 61 | 92 | 88 | 96:/100 | 94 | 99 | 99 |100 |100 
9 62)) 68 | 84 |.88 | 87 |-90'|;96.). 96 | 95 | 97 
10 53 | 88 | 90 | 97 | 95 | 99 |100 |100 |100 | 100 
21 42 | 75 | 85 | 95 | 97 |100 | 99 |100 | 98 | 100 
12 49 | 74 | 88 | 95 |100 | 97 | 99 |100 | 99 | 100 
13 A4 | 74 \*84 | 89 |.84 | 88 | 90:1 95 | 95° 1| 97 
14 | 63°] 73 | 57 | 74. |-76 | 90'|..98 | 82 |} 86 | 89 
15 51 | 64 | 69 | 91 | 95 | 98 | 96 | 99 | 99-/100 
16 32°} 72°] 64 |-90 | 86 |-96)| 99° |'97 | 99 | 99 
17 a5 | 68-1 56 | 78 | 84 | 90'|.90°) 84 | 94 | 97 
18 38 | 67 | 75 | 91 |-86 | 91-| -93:| 97 | 96 |° 98 
19 26 W621 72 | 76), 82° | 841)-80.) 68 | 84 1° 85 
20 S2 a) e4GaleGl 1677. | S82 | S20 iSO al 79 deel 179 
21 18 | 22 | 48 | 59 | 73 | 84 | 90 | 90 | 95 | 97 
22 Meet 29S ESO BT 70 RL" 1811479 ee. ers 
23 36.) 05671 45 745. 7. 5S: | (7271.98 | 73 | 79-\ 88 
24 19°) 42 |-45 | 61°|-76 | 76'| 91 | 92° | 98 | 99 
25 26uet Ta S3 eA? P69 Ne 67 72 71 79 Wy 90 
26 PO OOM OTe AS eo bGee 64 111s 82 | 85) ie 88 
27 iw46 | 48. 1-48 1 61 B47) 68 |. 62 | 89) 45.1] 61 
28 Peto edhe 34 AO G8 6911, 772 85 98 
29 | 86 | 29 | 34 | 40 | 32 | 6O | 59 | 52 | 44 | 51 
30 PAG toe eoe 24 le 4868 9) 74s) 78 oa sy 
31 (elG ie 19 Ne24 | 83 AP e627 .61 | 66 4, 65 | 88 
32 ee so wet or 92 1645 4) 43) G14) 7541) 72 88 
33 e281 16 \eat | 40 1°48 1-45-63 57 | 50 | 68 
34 LOM LIA LT 20, 124-181 166 |) 64a 54) 69 
35 Pe Omet4e ies Wiel Tale eto 27. (51.152 ie 67 177 
36 Sha elb ee Al OT edd (ead 51) 26 ar24 | 49 
37 iplootets Ge talei9 ies] (584517462) 24138.) 51 
38 else eoleie 2s alee. 629) sa0 "047 | 44 e502 56 
39 25 | 88 | 23 | 26 | 87 | 28 | 39 | 44 | 48 | 67 
40 ete cowed i 89 8851) 28h ee8 1) 24-21 1.45 
A1 fet samt ime lO 21 e290 27 89 18"), 26 1h 40 
42 Peete a 28 See 19 a eosei 191 17" |. 29 
43 Moo al eeomie tem 28) Weoley 1491488 17 627 
44 42 ek O We $8 (etd 2) tS be t6* | 12.9199 4) 33 
45 Om O mma O Mesh ue OOkt ee e221 96.) 981) 80 
46 Gu eth e194) ee22 "1805 133° ))29 1-18 }86 | 36 
AT 9 SVT T e150 ekS 1287/31 1-16 5 | 28 
48 Ft STM NESS mos tie ce eies0) | «86° 19 5 27 
49 a3 OO! etAC ets | 17 AN #3510. 1-15 
50 PIS oe he 7 | 20 | 15 | 84 | 10 10 | 14 
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- TABLE VI— (Continued) 


SERIES II 
j GRADE 
No. 0F : 5 Petia tesa see: 
Sabb riys lil p iy yo (Vivi Vily VILL Tx | x Get oe 
96 | 98 |100 | 99 | 99 | 89 /100 |100 |100 | 100 
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_C—THE TIME ELEMENT 


One of the important considerations in selecting scales 
is the time needed for their administration and evaluation. 
Some group scales are so arranged that it takes a minimum of 
time to give them as well as to score them; while others go to the 
other extreme. Table VII presents briefly the approximate 
amount of time required to give those considered in this study. 


TABLE VII. TIME CONSUMED IN ADMINISTERING THE SCALES 


SCALE MINUTES 
Ouse ree oe aL. 70 
Classification , ae 50 
Virginia Delta I .. . 30 
Primer . ; : 25 
Whipple’s Group 5 ae 80 
Sentence Vocabulary 20—40¢ 


°This scale is given without a time limit; but the time varies from about forty minutes 
in the third grade to twenty minutes or less in the twelfth. 


These times are approximately those used in administer- 
ing the scales at Champaign. Of course, much time can be wasted 
by inefficient routine methods. The papers should be distributed 
and collected quickly. No petty interruptions should be permit- 
ted while the pupils are working. Under these conditions the 
time required for the administration of the different scales will 
be approximately as indicated above. 


After the intelligence scales have been administered 
much time is needed for scoring the papers. Many do not 
realize the tediousness of this work. It often costs more to 
score the papers and evaluate the results than to purchase the 
test materials. Data are presented here to show the approxi- 
mate rate at which the scales used in this study were scored. 
This work was done by trained clerks who used stencils wher- 
ever possible. Few teachers will approximate this rate of work 
when they first attempt to score similar papers. The numbers 
of papers scored per hour by our clerks are indicated in Table 
VIII for the different scales. 
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TABLE VIII. RATE OF SCORING BY CLERICAL WORKERS 
£ 


No. SCORED 

SCALE Per Hour 
Otisp aac One oe FF . 13 
Classification ee eee ee 15 
Wate, IONE M5 a 5 20 
Primer. opie: Ba 35 
Whipple’s Group Test 6 
Sentence Vocabulary (one series) 40 


The time which may be devoted to the analysis of data 
after papers are scored is exceedingly variable. It depends to 
a large extent upon the purpose to be served. However, approxi- 
mately the same amount of time should be allowed for this as 
is needed for scoring. With the exception of Whipple’s Group 
Test all the scales yield a total score. This fact places them on 
an equal basis after the total scores have been obtained. In 
other words, from this point on, each scale will require about 
the same amount of time—unless an analysis is made of some or 
all of the individual tests which compose the scales. This last 
process will generally be unimportant for school purposes. 


D—COMPARISONS OF TOTAL SCORES 


I. Correlations with Scholarship—The correlations be- 
tween the intelligence scores and scholarship are shown in 
Table IX. These values at first glance would seem to imply that 
the scales are not very reliable, that they do not adequately 
measure the mental characteristics important for school success. 
This might be the case, if the judgments of scholarship were en- 
tirely adequate. It will be worth while to consider this point 
briefly. 


The teachers were instructed to rate the children in 
scholarship on a special sheet. Accompanying this sheet was a 
set of mimeographed instructions which directed that letter 
ratings should be so distributed that the teacher of a normal 
class would give 5 percent of the class A’s, 20 percent B’s, 50 
percent C’s, 20 percent D’s, and 5 percent E’s. If a group was 
abnormal, the teacher was asked to rate the children in compari- 
son with all children of the same sex, race, and age. If these 
instructions had been carefully followed, the correlation with 
each scale would have been higher and more significant. 
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TABLE IX. CORRELATION BETWEEN SCORES IN THE INTELLIGENCE 
SCALES AND TEACHERS’ SCHOLARSHIP RATINGS 


Primer | Vocabulary | Va. DeltaI |Classification 


424,04 
380+.05 
.30+.05 54.05 .5T7+.06 


-50+.04 45+ .05 
A2+.08 -56+.04 
.50+.04 .69+.03 
45+ .04 71.04 
46.04 58+.05 
A2+.05 
-27+.06 
59+.04 
538+ .04 


2 Cases were too few to be significant. 


The teachers, however, were not able to follow the in- 
structions very closely. This fact is shown by Tables X and XI 
which present the distributions of the scholarship rating for 
the first and fourth grades respectively, these grades having 
been taken as typical. Among the different schools it is evident 
even without converting the number of ratings into percents 
that there are wide deviations from the suggested percentage 
distribution of rating. Even when the ratings for all the schools 
are combined and converted into percents the discrepancy be- 
tween the actual and theoretical distribution is still evident. It 
is clear, for example, that the first- and fourth-grade teachers 
gave a great many more A’s than would have been expected. 
The number of C’s was appreciably below the standard number, 
while the numbers of B’s and D’s (at least in the first grade) 
were of about the right order of magnitude. 


The average grade for each school was computed by al- 
lowing the customary ratings of 5, 4, 3, 2, and 1 respectively for 
the letters A, B, C, D, and E. These averages conceal a great 
deal. A teacher for example, may give too many A’s but if she 
balances them by giving too many E’s the average may turn 
out to be 3 and the impression may be created that the distribu- 
tion was correct. Nevertheless the averages do serve to indicate 
whether there is a constant bias on the part of the teacher in 
question in virtue of which she rates everybody too high or too 
low. There is, in the first grade, a slight tendency for teachers 
to rate their children above 3—that is above the expected aver- 
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age. This tendency is considerably more marked in the fourth 
‘grade. These deviations from the expected average are not at 
all accounted for by the scores in the mental tests at the schools 
in question. Median scores in the Primer Scale are shown in 
Table X and those for the Vocabulary Scale are shown in 
Table XI. 


TABLE X. DISTRIBUTION OF TEACHERS’ SCHOLARSHIP RATINGS 


FIRST GRADE 
NUMBER OF RATINGS AT SCHOOL: RATINGS AT ALL 
RATING SCHOOLS 
APO Lo ie 6 No. | Pereent 
2 ie eee ee 4 7438) neta 
Sy a 5 0 7 1 36 19 
i i 8 14 it: 7 64 33 
Poa ~7- aihersitiro-g 6 ol Partagas 
Sh i ey 4 3 As ty 73 21 | 11 
— —_— ———_ ——_——__ ——_——_ | 
Average 281>33) 81) 3 gindeasb eed | 
Median Score 48 40 43 41 | 46 / 
Primer Scale 44 


TABLE XI. DISTRIBUTION OF TEACHERS’ SCHOLARSHIP RATINGS 
FOURTH GRADE 


RATINGS AT ALL 


RATING 


NUMBER OF RATINGS AT SCHOOL: 


SCHOOLS 
1 2 3 a rf 5 6 No. Percent 
A 10 6 13 4 0 1 34 
B 3 8 16 aly 5 4 53 ~ 
C 19 14 8 TO dels 5 69 | 37 
D 5 i 3 2 3 2 22 Lap 
E 0 1 0 0-4), 46 1 8 | a 
Average 3.5 3.3 3.9 3.7 2.6 3 3.4 
Median Score 
Vocab. Scale 29 33 30 Bye 11|| ydsh 30 32 


The effect of these individual variations on the scholar- 
ship ratings by the teachers is to lower the coefficients of corre- 
lation. But another factor which diminishes the correlations 
has to do with effort. It is well known that many school children 
are working far below the limit of their ability. If a superior 
child does not apply himself his scholarship rating may be 
mediocre or even poor, although his intelligence score may be 
high. Indeed, the novelty of the test situation and the shortness 
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of the effort required frequently combine to secure a perform- 
ance which corresponds more closely to actual ability than does 
the sustained routine performance of the classroom. Again, the 
exceptional industry of a child of ordinary ability may place 
him among the best, or at least much above his companions of 
equal general ability. Other factors, such as:sickness, irregular 
attendance, and change of schools may cause a child’s scholar- 
ship to be rated much below what it would be under normal 
conditions. 


Under these circumstances, the distribution of the coeffi- 
cients of correlation for the different tests reveal a reasonably 
high correspondence with scholarship ratings. (See Table XII.) 
The average for the entire group is +0.462. If the correlations 
for the Primer Scale are omitted, the average is a little higher, 
namely, +0.497. 


TABLE XII. DISTRIBUTION OF COEFFICIENTS OF CORRELATION BE- 
TWEEN INTELLIGENCE SCORES AND SCHOLARSHIP RATINGS 


COEFFICIENTS NUMBER 
-70—,79 if 
.60—.69 if 4 
.50—.59 | 12 
.40—.49 9 
.20—.39 4 
.20—.29 3 

Average 462 


It was thought that it might be significant to combine the 
scores of two somewhat dissimilar scales like the Virginia Delta 
I and the Sentence Vocabulary.” The combined scores of these 
two scales ought to show a higher correlation with scholarship 
then either alone, if they measure different phases of intelligence 
accurately, and to the extent that the ratings of scholarship are 
reliable. The resulting coefficients of correlation are as follows: 
Grade Ill, 0.64; Grade Iv, 0.44; Grade V, 0.52; Grade VI, 0.54; 
Grade vil, 0.52; and Grade vill, 0.54; Average 0.53. These 
figures do not show that the combination has very materially 
raised the correlations. 

Our evidence as well as that exhibited in other investi- 


5The scores were combined by dropping the score made in Test 3 of the 
Virginia Delta I, multiplying the Sentence Vocabulary score by two and 
finding the total. This procedure attempted to give equal weight to both 
of the scales. Test 3 was dropped because it is not discriminative. 
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gations tends to show that while scholarship and intelligence are 
by no means independent, their relationship is also by no means 
perfect. Each is affected by conditions which do not affect the 
other to the same degree. Even if scholarship is accurately 
judged by teachers, perfect correlation cannot be expected. It 
is probable that a coefficient of much more than +0.60 between 
mental test results and estimates of scholarship would mean 
either that the test or the estimates were faulty. The test might 
be such that success in it depended too much upon schooling; 
or the estimates of scholarship might be too greatly influenced by 
the notion of natural ability. 


II. Correlations between the Scales—The correlations 
between the different scales administered in this study are inter- 
esting and suggestive. As many as the data permit are shown 
in Table XIII. In some cases results from the same grade in 
several schools were used; in others the correlations had to be 
determined for the grade of one school only. Determining these 
values for a single class applies a much more rigid standard to a 
scale than would be the case, if correlations were computed from 
the combined results for different classes in the same grade or 
for different grades. Note, for example, that the correlations 
of the Sentence Vocabulary Scale with the Virginia Scale Delta 
I are without exception higher for all schools (column 2) than 
they are for one school (column 3). This is significant. In all 
probability it implies that the true correlation between the 
Sentence Vocabulary Scale and Virginia Delta I is appreciably 
higher than is here indicated. 


In general, higher correlations are found where results 
from several different grades are used. This is because such a 
selection gives a greater spread of abilities. The two combina- 
tion correlations presented in Table XIII for the Otis Scale re- 
veal this tendency. (See the last entries in columns 5 and 7.) 
This procedure has sometimes been adopted in studies of mental 
tests. But a high correlation of this sort is not so significant 
as a high correlation secured from the more homogeneous ma- 
terial of a single grade. Since intelligence scales will be most 
useful, if they distinguish between the children of a single grade, 
this rigid test will be employed in the consideration of the merits 
of the different scales. This criterion should not be confused, 
however, with the procedure of others who compute correlations 
by using data from the combination of several grades. 
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TABLE XIV. DISTRIBUTION OF INTER-SCALE CORRELATIONS 


CORRELATIONS| ALL TESTS 
80—.89 3 
70—.79 6 
60—.69 12 
50—.59 3 
40—.49 1 
380—.39 je 


The high correlation values that have been obtained where 
two scales have been given to the children in a grade seem to 
indicate that these scales in the main are measuring much the 
same thing. When the diverse character of the measuring in- 
struments is considered this fact is somewhat remarkable. Of 
course, the values are all low enough to lead one to be some- 
what conservative in drawing conclusions concerning individual 
pupils from a single test. 

Ill. Reliability of Total Scores—The correlation dis- 
cussed in the two preceding sections may be considered to indi- 
cate the worth of mental tests from the point of view of relation- 
ships. These correlations, however, do not reveal many of the 
things which one would like to know about the different intelli- 
gence scales. It may be a valuable thing to have tests which 
correlate highly with one another or it may be the reverse. 
Further, it may be worth while to have tests which correspond 
closely to the scholarship ratings made by teachers; but on the 
other hand, if the scholarship ratings, given by the teachers 
under the conditions described above, do not forecast the real 
possibilities of pupils, high correlation with such scholarship 
ratings may not be either desirable or informing. It is con- 
ceivable that mental scales may be devised which to a greater 
extent than is true of these scales will direct attention to pupils 
who are brilliant, average, or mediocre, in a way that will enable 
teachers to develop their talents. It may be that the methods 
of education now in use are pedantic or that intelligence scales 
measure, qualities which are highly desirable in life, but which 
do not function in school work. These questions cannot be 
settled here; but it will be worth while to consider the scales 
from another point of view, namely that of their power to dis- 
criminate between different intelligence levels. 


Much of.the discussion from this. point on will make the 
assumption that there is a difference in intelligence level be- 
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tween grades—that third-grade children are, in general, more 
mature in intelligence than second-grade children, that fourth- 
grade children are still more mature than third-grade children, 
and so on. As the higher grades are reached, this difference 
probably decreases in absolute value. Correspondingly, the 
grade intervals shown by a test at different grade levels should 
probably not be equal. The differences, however, are taken to 
be appreciable; and good tests ought to reveal them clearly. 
Further, it should be possible to magnify these amounts some- 
what by the use of scales especially devised to discriminate at 
the higher ranges. This assumption is approximately that of 
Binet, Terman, and others who developed the individual intel- 
ligence scales. There is this difference, however, the earlier 
workers made their groupings upon an age basis. The groupings 
here made are upon the school-grade basis. This basis was 
used, first, because the age groups are not completely represent- 
ed in school. The least intelligent children of ages five, six, and 
seven have sometimes not entered school, while the more bril- 
liant ones beyond the age of fifteen have in many instances, com- 
pleted the public school. Consequently, we can use twelve un- 
selected grade groups, while we would not have that number of 
comparable age groups. The grade basis was used, second, 
because it is more serviceable. In the schools children are classi- 
fied by grades, not by ages. If a test is given, it is given to 
the children of the same grade, not to those of the same age. 
Standards for the grades and differences between grades are 
therefore more immediately useful than similar facts on the 
basis of age. 


With the assumption made in the preceding paragraph 
we may compare the different intelligence scales on the basis of 
their power to discriminate between grades. If the fourth 
grade is more mature than the third there should be a difference 
between the third grade and the fourth grade in intelligence 
scores. That scale which reveals the most reliable differences be- 
tween averages for the different grades may be considered to be 
’ the most discriminative. These facts have been computed for 
the different scales and are presented in Table XV. 


————— 


TABLE XV. DIFFERENCES BETWEEN GRADE AVERAGES WITH RELIABILITIES (PROBABLE ERRORS) FOR TOTAL 
SCORES IN EACH SCALE 


DIFFERENCES BETWEEN AVERAGES 


GRADES = 
Primer Vocabulary Otis SLASEUDE ATION Virginia Nee ed ha 
Wisconsin Champaign | 
> Lee. Th 
Ilé& Ill 
Slii&as EV Ai lpteeey eet () ees ann ne epee ME cP) cancs. ap cccchadiacenniin 72h Neel Ves eed 39.7+3.3 
IV & V Ae sll, apatites tees Sere aM pe ane tei Ose val: 19.4+42.3 
Vé& VI 8.0+0.6 | Pas yf aes A) 20 .0==402 28.5+2.3 
VI& VII 5.6220.7 WS yey. i ig20=-2.1: 9 2=-1.2 24.2==2.6 
VII & Vill 5.00.7 15.0+4.1 IBS Yenc Layee LOSS 5 eH 122 A TEEZ-S 
VIIT& IX Dee 0.8 DDE =2.D Web ==166 T1C) a eset b 
IX & xX S:02= 0,9 Aie=2eD feoa= leo 6.2+3.6 
NGC XL 3.50.9 Viper 5) —2.4+1.7 9.6+4.0 
XI& XII ee 0.9) SOL=2.0 10.2+1.6 7.3+4.0 
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Since the different scales involve different numbers of 
units, a unit of one (e.g. the Vocabulary Scale with a possible 
score of 100) is not equivalent to a unit of another (e.g., the 
Otis Scale with a possible score of 230). The differences, how- 
ever, between the grade averages shown in Table XV may be re- 
duced to the same basis, either by dividing the grade differences 
by their probable errors (thus making the differences compara- 
ble), or reciprocally, by dividing the probable errors by the 
differences. This latter procedure gives us the so-called coeffi- 
cient of variability. It is commonly held in statistical circles 
that a quantity should be at least three times its probable error 
to be worthy of statistical consideration, or, in other words that 
the probable error should be no more than one-third of the 
quantity. 


With this standard we may examine the coefficients of 
variability presented in Table XVI remembering that the small- 
er they are the more reliable are the grade differences given in 
Table XV and that according to the standard they should be 
less than 0.333. Comparing the values grade by grade, it can be 
seen readily that some of the scales are much more discrimina- 
tive than others. The best values for grades Ill to VI are 
shown by the Virginia Scale. From grades VI to xII the Vo- 
cabulary Scale seems to show the best discrimination. It is some- 
what unfair to compare the scores secured in Champaign with 
the results furnished by Dr. Theisen for the Classification Test 
in Wisconsin, for it is altogether possible that a measurement 
of the children in Wisconsin with the Otis and Vocabulary scales 
would reveal a greater degree of discrimination than these tests 
revealed in Champaign. However, the comparatively good 
values shown by the Wisconsin figures are interesting and sug- 
gestive. Toa certain extent the large (and hence unfavorable) 
coefficients yielded for the Otis and Classification scales by the 
Champaign data are due to the fact that the number of children 
tested was less than 100 in every grade. But this is not the sole 
reason. The numbers of children who took these tests were no 
larger in the grades below the high school, yet the coefficients 
of variability are much smaller for those grades. In the high- 
school grades only one of the eight coefficients of variability for 
the Otis and Classification scales in Champaign is less than 
0.333. This would imply, so far as our data permit an inference, 
that neither test is as reliable for high-school work as one would 


wish. 
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GRADES 


Ié it 
IL& * It 
IIn& Iv 
IV&é Vv 
Vé& VI 
VI& VII 

VII & VIII 
VHUI& 1X 
eek 


x oe 


XI& XII 


TABLE XVI. 


COEFFICIENT OF VARIABILITY OF DIFFERENCES BE- 
TWEEN TOTAL SCORES FOR SUCCESSIVE GRADES 


PRIMER 


0.068 
0.117 


VOCABULARY 


0.068 
0.155 
0.075 
0.125 
0.140 
0.156 
0.257 
0.257 
0.409 


CLASSIFICATION VIRGINIA & 
OTIS VIRGINIA VOCABULARY 
Wisconsin Champaign 
saree etre Woastinassstesttiacertasens|) Geb tesisececslien 0.056 0.083 
Breen Pa ot See cae | ie ee ca 0.108 0.118 
re ect oi | See pe 0.080 0.060 0.080 
O22 6 9 ieee eer ere ee 0.155 0.130 0.107 
0.273 0.111 0.196 0.2385 0.158 
0.454 0.091 0.237 
0.531 0.178 0.580 
0.342 0.708 0.416 
0.787 0.156 0.547 
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The discriminative power revealed by the scales for wider 
ranges of grades is also interesting. A few of the many possible 
facts of this sort are presented in Table XVII. Observe that 
the grades set up in this table have reference to the elementary 
school, the junior high school, and the senior high school. The 
greatest discriminative ability is shown by the sum of the Vir- 
ginia and Vocabulary scales. The Virginia Scale by itself seems 
to be twice as discriminating as the Vocabulary Scale for the 
elementary school. When the Vocabulary, Otis, and Classi- 
fication scales are compared for grades VI to IX, they rank in 
that order. For the last three years of the high school the 
Classification Test is first, the Vocabulary second, and the Otis 
third. These facts imply not only that some of the scales are 
more discriminative than others, but also that we may secure 
better scales than any we now have. 


E—ANALYSIS OF THE INDIVIDUAL TESTS IN THE INTELLIGENCE 
SCALES 


I. Individual Test Scores—Each of the instruments used 
in this investigation for measuring intelligence (except the 
Sentence Vocabulary) consists of several series of questions or 
things to do. Each of these instruments without regard to its 
actual title may be called a scale; and each separately organized 
group of questions and things to do may be called a test. Thus, 
the Otis Group Intelligence Scale has ten tests; Whipple’s Group 
Test for Grammar Grades has six tests; the Virginia Delta I has 
six tests; the Classification Test has eight tests; and the Primer 
Scale has four tests. Since the total score on a scale is to be 
taken as indicative of intelligence, the theory is that each of the 
component tests shall “tap” important elements of intelligence, 
and that the score on each test shall enter into the total for the 
scale to a degree that will give the proper emphasis to the ele- 
ment or phase of intelligence to which the test relates. 
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TABLE XVII. DISCRIMINATION BETWEEN LARGER GRADE RANGES FOR INTELLIGENCE SCALES 
GrabeEs III ro VIII GRADES VI To XI GRADES IX To XII 
SCALE Difference | Coefficient | Difference | Coefficient | Difference | Coefficient 
between of between of between of 
Averages | Variability | Averages | Variability Averages | Variability 
Vocabulary 34.8+1.2 0.034 15.70.7 0.044 OP a0.9 0.098 
Virginia 69.0+1.3 0.019 
Virginia and 
Vocabulary 129.5+0.3 0.008 
Otis 35.72.38 0.064 15.32.1 0.137 
Classification : 
Champaign S432 0.067 Pieliets 2a 0.091 
Wisconsin Ape 1.2 0.079 
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The tests of which each scale is composed are as follows: 


Otis Group Intelligence Scale 


I. Following Directions 
II. Opposites 
III. Disarranged Sentences 
IV. Proverbs 

V. Arithmetic 
VI. Geometric Figures 


VII. Analogies 
VIII. Similarities 
IX. Narrative Completion 
X. Memory 


Classification Test 


I. Following Directions 


II. Synonym-Antonym 
Ill. Arithmetic 
IV. Common Sense 
V. Completion 
VI. Analogies 
VII. Number Completion 
VIII. Information 
Virginia Delta I 
I. True-False 
II. Arithmetic 
III. Picture Completion 
IV. Synonym-Antonym 
V. Common Sense 
VI. Information 
Whipple’s Group Test for Grammar Grades 
J. Arithmetic 
II. Completion 
Ill. Substitution 
IV. Reasoning 
Part I 
Part II 
Part II] 
V. Punched-Hole Test 


Proverbs 
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. Pressey Primer Scale 


JI. Dot Pattern 
II. Classification 
III. Form Board 
IV. Absurdities 


Now, it will probably occur to anyone who has considered 
the foregoing material thoughtfully that the various tests which 
compose the different scales are not likely to be of the same value. 
To determine the extent to which this is the case, distributions 
were made for each test by grades. These distributions and in- 
terpretations were enlightening, but the limitations of this 
bulletin permit the presentation of no more than the general 
features. 


A marked difference in the curves of distribution was 
shown by the several tests. Differences were also shown when 
the school-grade distributions were compared with each other 
for tests of the same kind occurring in different scales. In 
other words, some of the tests were not suited to the grades in 
which they were used, were too hard or too easy or too irregular. 
These facts will be presented more in detail in the next section. 


II. Differences betwen Successive Grade Averages— 
- The differences between the grade averages for each of the 
tests were computed. These revealed wide divergencies, show- 
ing that some of the tests were poorly adapted to the work that 
was expected of them. 


The Otis Scale contains tests which are of small diag- 
nostic value when discrimination between successive mental 
levels is sought. A number of inversions, cases where the score 
in a test decreased with the next higher grade, were noticed. 
While these might be due to some degree to an insufficiency of 
cases, this would not fully explain the fact that some tests 
showed inversions while others did not. The inversions sug- 
gest poorly constructed tests. For each test of the Otis Scale 
the differences between the average score in the highest and 
lowest grade was determined. It was thus found that the 
amounts contributed to the difference between the scores for the 
entire scale varied markedly, being seven times as large for 
Test IV (Proverbs) and Test IX (Narrative Completion) as for 
Test VIII (Similarities). The amount of this difference for 
each test—i.e., of the difference between the average score of 
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the lowest and highest grades concerned—may be taken as one 
indication of the discriminative power of: the test. The tests: 
in the Otis Scale arranged in order from the most discrimina- 
tive to the least according to the differences between the sixth 
and twelfth-grade averages are given in Table XVIII. Asa 
whole, the Otis Scale showed more discriminative power in the 
grammar grades than in the high school. 


TABLE XVIUI. DISCRIMINATIVE POWER.OF TESTS. OTIS SCALE 


| DIFFERENCE BETWEEN | PROBABLE 
Test SIXTH AND TWELFTH ERROR OF THIS 

_ GRADE AVERAGES DIFFERENCE 
EV) | 8.5 0.5 
IX 8.4 | 0.7 
III 8.2 | 0.5 
II 6.7 0.5 
VII 4.3 0.5 
VI 4.0 0.4 
V 2.6 0.4 
EX 2.0 0.4 
I 1.8 0.4 
VEL ee 0.4 


The Classification Test is better organized than the Otis 
Scale when the power of the individual tests to discriminate be- 
tween successive grades is used-as a criterion. Although no 
more of the Classification Test were administered than of the 
Otis, there were in the former case fewer negative differences 
between the averages of successive grades. 


Moreover, judged by the larger recorded differences be- 
tween the performance of sixth- and twelfth-grade children, 
the component tests of the Classification Test are considerably 
were found as shown in Table XIX. Again, taken as a whole, 
this scale, like the Otis, is more discriminative in the grammar 
grades than in the high school. 
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TABLE XIX. DISCRIMINATIVE POWER OF TESTS. 
CLASSIFICATION TEST 


DIFFERENCE BETWEEN | PROBABLE 
SIXTH AND TWELFTH ERROR OF THIS 
GRADE AVERAGES | DIFFERENCE 


1 
1 
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Marked differences between the discrimination of tests of 
the Virginia Delta I Scale are also evident. Indeed, the incom- 
plete picture (Test III) is apparently not suitable above the 
primary grades. Arranged in the order of the differences be- 
tween averages for grades III to vill the tests are as follows: 
Test VI, 21.9+0.4; Test I, 14.8+0.5; Test IV, 14.7+0.4; Test V, 
8.2+0.1; Test II, 6.30.1; and Test III, 1.10.2. 


There is very little difference in discrimination between 
_ the tests of the Pressey Primer Scale or between the two vocabu- 
lary scales. Such differences as exist do not materially affect 
the use of either of these measuring instruments. 

In general it may be said that the present scales have been 
arranged without a careful analysis of their component parts. 
They are poorly balanced when the individual tests are consid- 
ered, and the comparative success that attends their use at pres- 
ent is due more to the homogeneity of human mentality than to 
the scientific derivation of the measuring instruments. 


III. The Coefficients of Variability of Individual Tests— 
The influence of different numbers of units in the various in- 
dividual tests may be eliminated by reducing the values, which 
have been the basis for the discussion in the two preceding sec- 
tions, to coefficients of variability in the same way that we calcu- 
lated the coefficients of variability for differences in total scores. 
There is one point which must be considered, however, and that 
is that the probable error of the difference between two averages 
is affected in a constant direction by the number of cases in- 
volved. Although the differences between any two averages 
probably will not vary markedly with double the number of 
cases, the probable error of that difference will be smaller, if 
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twice the number of cases are studied. Where varying num- 


bers of pupils have been tested by the scales, this fact introduces 


a certain element of unreliability into any comparisons which 
may be made between the different scales. This point applies to 
total scores as well as to the scores in the individual tests, al- 
though it was not mentioned in the section discussing total 
scores. Within the same scale, however, this fact ceases to 
operate because the same number of cases is involved in all the 
tests of a scale. 


The coefficients of variability for the individual tests of 
the six scales studied are presented in Tables XX to XXV. The 
wide variations between the individual tests which were noted 
briefly in our discussion of the differences between’ successive 
grade averages have disappeared. The result of reducing them 
all to the same basis without reference to their effect on the total 
score shows them to be more nearly alike in discriminative pow- 
er if the differences in weighting were equalized. Nevertheless, 
it can still be shown that some of the scales are much better or- 
ganized than others. In the Otis Scale, for example, Test I shows 
a coefficient of variability for grades VI to Ix (0.78) that is ten 
times as large as that shown by Test VI (0.076). An extreme 
range, such as this, is plainly not to the advantage of a scale. 
The variations in discriminative power between tests should not 
be considered as evidence that the tests are of no value. They 
merely mean that in the present form the tests are not well or- 
ganized for the work that they are to do. It is altogether possi- 
ble that a revision might introduce the changes that are needed. 


—— 
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TABLE XX—COEFFICIENTS OF VARIABILITY OF INDIVIDUAL TESTS BY GRADES FOR THE OTIS SCALE 


Coefficients of Variability of Differences between Scores for Indicated Grades 


Test 

7&8 8&9 9&10 10&11 11&12 6&12 6&9 9&12 

I 7.0 0.3815 0.285 0.50 0.50 3.0 0.055 0.78 0.37 

II 0.50 0.50 Oeil 0.666 0,363 OTL 0.074 0.099 Oi 
III 0.80 0.296 4.0 | 0.40 0.125 | 0.357 0.060 | Op AT! 0.104 

IV 0.379 0.478 1.666 0.166 2.50 | 3.0 0.058 | 0.078 0.19 
V 0.277 5.0 0.187 0.50 0.875 | 0.363 0.153 | 0.096 0.897 

VI 0.250 0.454 0.50 0.30 1.0 | 0.30 | 0.025 0.076 alo) 
VII 1.20 0.187 0.60 0.750 1.50 | 2.0 0.116 0.086 0.854 

VIII 5.0 5.0 O2R2 0.375 Ose 1.50 0.3383 0.3822 aol 
IBS 1.0 Ouate 1-760 | 0.857 0.60 0.714 0.083 0.123 0.224 

».¢ 0.625 0.812 0.666 3.0 8.0 a 0.20 0.213 1.85 


“The difference was zero. 


This made it impossible to compute a coefficient of variability that would mean anything to any but a mathematician. 


TABLE XXI—COEFFICIENTS OF VARIABILITY OF INDIVIDUAL TESTS BY GRADES FOR CLASSIFICATION TEST 


Coefficients of Variability of Differences between Scores for Indicated Grades 


Test | 

| 5&6 6&7 7&8 8&9 9&10 10&11 11&12 5&12 6&9 9&12 
; I 2.0 0,350 0.692 1.0 0.60 | 0,428 0.50 0.056 0.062 0.016 
Pio 5 0-108. 1.40 0.080 0.380 0.75 | 0.333 4.33 | 0.066 0.098 0.022 
| Il 0.166 0.187 0.571 0.285 0.294 1.0 1.0 | 0.045 0.010 0.019 

eet V 0.166 0.727 0.133 0.222 a 2.0 0.40 | 0.055 0.085 3.0 
Vv 0.153 0.111 0.333 1.333 0.411 0.350 1.750 | 0.061 0.102 0.181 
VI 0.184 0.233 0.70 0.152 0.916 0.478 5.50 | 0.072 0.106 0.338 
VII 0.081 0.375 0.444 0.263 1.0 1.50 0.50 | 0.058 0.022 0.250 
Vill 0.125 0.187 2.50 | 0.104 8.0 0.375 0.555 0.038 0.063 0.016 


-“The difference was zero. 
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TABLE XXII—COEFFICIENTS OF VARIABILITY OF INDIVIDUAL TESTS 
BY GRADES FOR VIRGINIA DELTA I 


Coefficients of Variability of Differences between 


Test Scores for Indicated Grades 
| 8&4 | 4&5 5 &6 | 6&7 | 7&8 3&8 
| | Wietead ees 
I 0.111 O66 0.1295 1 20,411 0.190 0.033 
II 0.055 0.111 | 0.055 | 0.055 | @ 0.015 
III 0.3383 | 0.166 | 020 | 0.285 | 1.0 0.181 
IV 0,069 0.142 0.183 | 0.181 | 0.166 0.027 
Vv 0.052 0.047 0.142 0.068 1.0 | 0.012 


Wal 0.048 | (0.105 0.062 {| 0.073 | 0.214 0.018 


“The difference was zero. 


TABLE XXIII—COEFFICIENTS OF VARIABILITY OF INDIVIDUAL TESTS 
BY GRADES FOR WHIPPLE’S GROUP TEST 


Coefficients of Variability of Differences between 
Scores for Indicated Grades 


“Test V was not used. 
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TABLE XXIV—COEFFICIENTS OF VARIABILITY OF INDIVIDUAL TESTS 
BY GRADES FOR PRIMER SCALE 


Coefficients of Variability of 
Differences between Scores 


Test for Indicated Grades 
1&2 
I 0.103 0.150 0.081 
II 0.125 0.136 0.065 
Ill 0.111 0.30 0.086 
IV 0.081 0.115 0.047 


TABLE XXV—COEFFICIENTS OF VARIABILITY FOR SERIES G AND H 
BY GRADES FOR VOCABULARY SCALE 


Grades Series G | Series H 
8& 4 0.072 0.089 
ANS oD. 0,088 0.250 
5 & 6 0.096 0.111 
6& 7 0.166 0.133 
fica Mes 0.186 0.122 
8& 9 0.272 0.208 
9 & 10 0.285 0.161 
10 & 11 0.555 0.555 
Tak (oe i1P, 2.000 0.3833 
38 & 12 0.019 0.019 
8 & 6 0.033 0.089 
6& 9 0.041 0,031 
OF Seis 0.083 0.075 


An examination of these tables reveals the reason for 
the lack of discriminating power in the high school which is 
shown by the Otis, Classification, and Vocabulary scales. The 
coefficients of variability are large. Since these coefficients 
arise from dividing the probable errors of the differences be- 
tween grade scores by these differences, large coefficients indi- 
cate small differences and large probable errors of these differ- 
ences. In other words, they indicate little discrimination be- 
tween the performance of the grades in question and the dis- 
crimination that exists is unreliable. When, therefore, relatively 
large coefficients are found in connection with the high-school 


grades, it means that the materials are not so graded that they. 


will provide sufficient steps of difficulty for these grades. This 
suggests, then, that differentiation in the content of intelligence 
scales will be needed for different grade levels. We shall prob- 
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ably need as much flexibility in intelligence scales as has been 
found necessary in educational tests. 


Many details are shown by the tables. For example, it 
will be noted that in Table XXI Test IV in the Classification Test 
does not appear to be properly graded for grades Ix to xt. It 
may be added parenthetically that Dr. Theisen has been aware 
of this fact for some time, for the same point was revealed by 
his own distributions of the scores in this test. Thus, the tables 
furnish an opportunity for many comparisons. It is not felt 
that it is worth while to call attention to all of them. The chief 
irregularities in the tests have been pointed out in the preceding 
comments. 


IV. Correlations between Equivalent Tests—Every per- 
son who has had even a limited experience with examinations 
and mental test work realizes that individuals do not always 
do their best when placed in a test situation. This variability has 
been noted many times in the literature of school marks and 
similar studies. But since formal tests have become rather 
widely used, many people have neglected this fact and have. 
assumed that for all intents and purposes the scores made by 
individuals on a test are reliable to a high degree. This assump- 
tion is not always true. 


This investigation afforded an opportunity for a consid- 
eration of this question. Several of the scales contained tests 
that could be considered equivalent. In some cases there was a 
divergence in form and structure of the test which might be 
responsible for variability, but the resemblance was close enough 
to make it seem fair to make the comparison. Accordingly, the 
correlations presented in Table XXVI were computed. 


TABLE XXVI—CORRELATION BETWEEN EQUIVALENT TESTS 


Test 


Arithmetic in 
Otis and Classification 
Otis and Virginia 
Classification and 
Virginia 


Trabue in 
Classification and Group 


Analogies in 
Classification and Otis 


Opposites in 
Otis and Classification 
Otis and Virginia 
Classification and 
Virginia 


Vocabulary—Series G & H 


Information in 


Virginia and Classifica- 
tion . 


Common Sense in ‘ 
Virginia and Classifica- 
tion 


Grade 


Pa vir VIII 
| 
| 0.72+0.05 
0.52+0.06 | 0.53+0.12 | 0.51+0.07 
| 1 
| 0.58-£0.05 | 0.35-40.07 | 0.43+0.07 | 0.70-£0.05 
0.36-+0.06 
| 0.110.10 
| 0.36+0.11 
‘ 0.19+0.11  0.56+0.11 | 0.28+0.08 
0.45+0.06 | 0.34+0.07 | 0.47+0.06 | 0.46-+0.06 
0.41+0.05 | 0.20+0.04 | 0.38+0.04 | 0.40+0.04 | 0.58+0.04 | 0.55-+0.04 
| 
| 0.58+0.05 | 0.54:£0.05 | 0.620.06 | 0.39-+0.07 
| 0.51+0.06 | 0.30+0.06 | 0.89+0.08 | 0.140.08 


oe Se 
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_ A casual inspection of this table shows that there is a 
great difference in the degree of reliability between comparable 
individual tests. Unless the repetition of a test produces ap- 
proximately the same score for individuals when taken on suc- 
cessive occasions, its reliability may be seriously questioned. 
When individual tests do not show a fairly high degree of relia- 
bility as measured by the correlation between the individual 
scores made on the two trials, one cannot expect a scale com- 
posed of these unreliable individual tests to be very satisfactory. 


It is possible that tests differing in structure and content 
(e.g., Arithmetic and Analogies) may measure the same mental 
functions. The question then arises as to whether it is better 
to give different tests or merely to use different forms of the 
same test. If the same test is repeated without rest periods, 
the results may not be as satisfactory as if different tests are 
used, because children may not concentrate as readily on the 
later repetitions of the test. On the other hand, later repeti- 
tions of a test might secure better results, if novelty was a dis- 
tracting factor during the first attempt. For variety’s sake— 
and variety is the spice of schoolroom testing—it may be desira- 
ble to use tests different in character. In all probability, if the 
scale is to be repeated, the best plan will be to administer the 
different forms on successive days. By this procedure the 
chance physiological variations which no doubt affect the child’s 
efficiency will tend to be neutralized; for the average of several 
performances measured on different days usually gives more 
accurate data about a pupil than a single measure. This point 
is one that lends itself to further study and is one that ought to 
be made the center of careful research in order that we may 
have an adequate foundation for both intelligence and educa- 
tional testing. 


This point may be considered from another angle. 
Brown ° has given us a formula by which we can determine the 
number of repetitions of equivalent tests needed to secure a 
reliability of any desired value. With this formula the number 


° Brown, Wm. Mental Measurement, Cambridge University Press, 1911, pp. 
101-102. Brown’s formula for computing the reliability of the amalgama- 
tion of the scores of similar tests is 


nr, UU eet 
ee WNCNCO i 
1+ (n—1)r, i mL) 
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of repetitions of a test needed to secure a reliability of 0.80, 0.90, 
and 0.95 have been computed and are presented in Table XXVII. 
From this table we may readily conclude that equivalent tests 
which do not reveal a correlation with one another of at least 
0.50 are of doubtful value. Applying the values in this table to 
the correlation coefficients presented for equivalent tests in 
Table X XVI, it can be seen that a large number of the individual 
tests are of poor reliability if they are to be used to decide the 


TABLE XXVII. NUMBER OF REPETITIONS OF A TEST NEEDED TO 
SECURE RELIABILITY INDICATED “ 


Correlation Reliability 
Coefficient 0.80 | 0.90 0.95 
0.30 iG he 21 45 
0.35 8 17 36 
0.40 6 14 29 
0.45 5 11 24 
0.50 4 9 19 
0.55 4 8 16 
0.60 3 6 13 
0.65 3 . 11 
0:70 2 4 9 
0.75 2 3 7 
0.80 3 5 
0.85 2 4 
0.90 3 


“In some cases the number of repetitions given will result in a reliability slightly better 
than that indicated. Round numbers alone are presented because it was felt to be absurd 
to speak of fractional repetitions. 


fate of individuals. When tests are used to measure groups they 
are more reliable. This conclusion is supported by the only 
correlation coefficient which could be computed from these 
data—a value of 0.74+0.05, representing the correlation be- 
tween the room scores in the elementary grades for Series G and 
Series H, Sentence Vocabulary Scale. 


V. Zero Scores in Individual Tests—Zero seores result 
from tests which are not suited to the capacities of the child. 
Whenever such scores become appreciable they imply that the 
test is being administered at a point beneath its lower level of 
efficiency. The percents of zero scores for the individual tests 
in the different scales are presented in Table XXVIII. Atten- 
tion is called to those which have especially high percents of 
such scores. They are tests IV and IX, Otis Scale; tests III and 
VII, Classification Test; tests IV and V, Virginia Delta I; and 


TABLE XXVIII. 
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FOR DIFFERENT SCALES 


PERCENT OF ZERO SCORES IN INDIVIDUAL TESTS 


Seale or Test 


Following Directions 
Opposites 

Disarranged Sentences 
Proverbs 

Arithmetic 

Geometric Figures 
Analogies 

Similarities 

Narrative Completion 
Memory 


CLASSIFICATION 

Following Directions 
Synonym-Antonym » 
Arithmetic 

Common Sense 
Completion 
Analogies 

Number Completion 
Information 


Grades 


i 
FPOoOrFBRROC& 


VIRGINIA DELTA I 
I True-False 
II Arithmetic 
III Picture Completion 
TV Synonym-Antonym 
V Common Sense 
VI Information 


SoraCcCoo 


WHIPPLE 
I Arithmetic 
II Completion 
III Substitution 
IV Reasoning Tests—. 
Part 
Part 
Pats 
Punched Hole Test 
Proverbs 


Il 


Vv 
VI 


PRIMER 
I Dot Pattern 
II Classification 
Ill Form Board 
Tv Absurdities 


lor ==) 


CoH Et oo 
ENR O 


Corn 
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Test IV, Parts II and III, Whipple’s Group Test. There is no ap- 
parent agreement in respect to zero scores among tests of the 
same kind occurring in different scales. Those which do not 
qualify in some scales apparently do so in others. The difference 
in efficiency of these various tests can be attributed to their de- 
gree of difficulty and the way they are administered. The gen- 
eral information presented by the Vocabulary and Virginia 
Delta I scales seems to suggest that literary scales have a low 
degree of efficiency in the third grade. There are many children 
in this grade who have not sufficiently mastered the mechanics 
of literary tests to permit the measurement of their intelligence 
by these instruments. A zero score on a test does not imply 
zero mentality. It does imply an inferior mentality or lack of 
comprehension. In general, it may be said that zero scores when 
they are appreciably numerous indicate the approximate lower 
limits of the efficiency of a test. If a scale contains several 
tests which result in zero scores in the same grade, it is clear 
that the resulting total score cannot justly be compared with 
total scores wherein all of the tests function. What effect this 
point would have on the discriminative power of a scale is not 
answered by these data. It raises a question which might well 
be given careful study. 


The question of zero scores should also be considered 
from the administrative point of view. Whenever a test is given 
which the child does not comprehend because of incomplete in- 
struction or difficult subject matter, the problem of copying is 
introduced. Nearly every pupil is accustomed to take cues from 
others when anything is not clear. Consequently, when children 
in groups are asked to do something which is beyond them, they 
look around to see what others are doing. This behavior is not 
such dishonesty as would warrant a severe penalty. However, 
it is an argument against the administration of tests which re- 
sult in many zero scores. 


F—GENERAL COMPARISON OF SCALES 


The foregoing data in Part II furnish many items of 
information concerning the six intelligence scales considered. 
Isolated, these facts do not tell us very much; but when sum- 
marized they prove more valuable. It will be worth our while 
to spend a few moments discussing the criteria by which scales 
may be considered. 
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I. Grades for which the Scales are Suitable—If some of 
the tests of a scale result in zero scores, it is not advisable to 
administer that scale in grades below the one in which zero 
scores first appear in quantity because the percent of zero scores 
increases very rapidly in successively lower grades and because 
administrative difficulties are raised when pupils are not kept 
busy by a test. The upper limit of a test is determined by the 
difficulty of the material. If total scores do not discriminate 
between grades, the upper limit has been reached. 


Il. Ease of Administration—Scales which have compli- 
cated directions, requiring careful study and much practice on 
the part of the examiner before they can be administered with 
any guarantee of success, are not so desirable as scales which 
are provided with more simple administrative procedures. Scales 
requiring a long time are preferable only when they are more 
discriminative than those requiring less time. 


Ill. Difficulties Involved in Scoring—Checking the work 
performed by the pupils is a purely clerical task which is much 
the same for the different scales. Sufficient information will be 
furnished on this point by a mere comparison of the amounts of 
time required to score an individual paper. 


IV. Correlation with Scholarship—As has already been 
pointed out, these correlations are rather low. Nevertheless, it 
is felt that the scale which correlates the highest is the best, and 
vice versa. 


V. Reliability—Scales which do not reveal sufficiently 
large differences between the average scores of successive grades 
manifestly cannot discriminate between the different abilities 
presented by the children in those grades. When differences are 
accompanied by large probable errors the reliability of such 
differences is low—i.e. the discrimination, whatever it may be, 
is questionable. 


VI. Individual Tests—Finally, the scales may be con- 
sidered from the standpoint of the degree of balance shown by 
‘the different individual tests which compose them. If the in- 
dividual tests vary markedly in their contributions to the total 
score and show inversions between averages for successive 
grades, the scales in which such tests occur are not as valuable 
as those which are well balanced and regular. 


Using the criteria which we have set up in the preceding 
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TABLE XXIX—-GENERAL COMPARISONS OF THE SIX SCALES 


Otis 


Classifi- 
cation 

Virginia 

Group 


Primer 


Vocabu- 
lary 


Suitability 


(Grades) 


6—12 


5—12 


4 Data not available. 


Reliability of 


Correlation | Total Scores: 
Administration Scoring with between Suc- Individual Tests 
Scholarship |cessive Grades 
Slightly complicated and | Lengthy 0.42 poor Erratic and poorly- 
lengthy balanced 
Slightly complicated and | Medium 0.50 poor Regular but poorly- 
of medium length length balanced 
Simple and brief Brief 0.59 good Regular but poorly- 
balanced 
Very 
Simple but lengthy lengthy a a Regular and well- 
balanced 
Simple and brief Very 0.34 fair Regular and well- 
brief balanced 
Very simple and brief Brief 0.47 good Regular and well- 


balanced 
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paragraphs, we may compare the six scales as indicated in Table 
XXIX. It is not always easy to generalize, and this limitation 
must be kept in mind as the comparisons are examined. To 
some extent they are personal opinions. 


G—SUMMARY AND COMMENTS 


I. The present intelligence scales admit of much im- 
provement. Some of them are erratic and poorly balanced, 
with comparatively poor reliability for the total scores. Others 
require too much time for administration or scoring. Nearly 
all of them can be improved by the addition of new material or 
by the preparation of different tests for different intelligence 
levels. ; 


II. The best scales for intermediate and grammar grades 
seem to be the Virginia Delta I and the Vocabulary scales. A 
combination of these two seems to offer the best measuring 
instrument. 


III. The Classification Test seems best for the high- 
school grades. 


IV. The Primer Scale is well organized from the stand- 
point of administration, scoring, and balance, but it is of ques- 
tionable diagnostic value. It should always be supplemented by 
other tests. 


V. The value of the different intelligence scales should 
not be determined alone on statistical grounds. There are so 
many factors that influence the work of the child that it is im- 
possible in every instance to forecast his performance in school 
work accurately by means of intelligence tests. There will be 
exceptions in most classes. These exceptions should be analyzed 
in the light of the information available and, in this way, many 
apparent deviations between scholarship and intelligence ratings 
may be satisfactorily explained. 

VI. The present outlook for the derivation and use of 


group intelligence scales is good. The field is fertile and there 
is every indication that successful! scales may be prepared. 
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PART III—MENTAL SURVEY OF THE CHAMPAIGN 
PUBLIC SCHOOLS 


A—INTRODUCTION 


The point of view taken in this portion of this bulletin is 
yuite different from that assumed in Part II where our attention 
was directed constantly to the individual scales and tests. In 
Part III, on the contrary, we shall consider the child as the unit. 
Intelligence scales function only as they throw light on the men- 
tal ability of children. 


For the purposes that we have in mind in Part III it will 
oe most convenient to make the assumption that for practical 
purposes the tests which have been used are fairly satisfactory. 
Of course, in the light of what has been pointed out in Part II, 
an assumption of this sort with respect to some of the test scores 
would not be warranted if it were necessary to make individual 
recommendations on the basis of these values. This procedure, 
however, will give us a basis of treatment which may be copied 
by superintendents and teachers who wish to analyze the intelli- 
gence situations in their schools. With this point of view in 
mind, the statistical treatment in this section will be very simple. 
Everything will be determined on the basis of distributions, with 
the median as the measure of central tendency. 


B—VARIATIONS BETWEEN SCHOOLS AND GRADES 


Teachers often remark that the pupils whom they teach 
in one semester are not equal in mental ability to those whom 
they taught in some particular preceding semester. Sometimes 
educators are inclined to charge these opinions on the part of the 
teachers to personal idiosyncracies, assuming that all school 
grades are much alike in character. This assumption is seldom 
justified. Classes vary from year to year in their composition. 
One may have a large number of mediocre pupils in it, with com- 
paratively few bright or dull ones. Another may have the ex- 
tremes with few average pupils. In other words, classes vary 
considerably when the finer points are considered. On the other 
hand, teachers often erroneously assume that there is an intel- 
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ligence difference between two classes. They are often led to 
make this assumption by the responses of a few superior or in- 
ferior pupils. A few exceptionally bright or dull pupils in a 
room will influence the tone of a group in a manner out of all 
proportion to their number. 


| I. Variations as Shown by Median Scores in Intelligence 
Scales—The variation between grades in the different schools 
may be shown by the differences between median scores made by 
the children of those grades in the different intelligence scales. 
These median scores are presented in Tables XXX and XXXI. 


An examination of Tables XXX and XXXI shows that 
the same grades in the different schools are much alike in gen- 
eral intelligence when considered from the standpoint of the 
class medians. The median scores show some variation, a part 
of which may be due to their unreliability as measures of the 
group intelligence, and.a part of which may be due to real differ- 
ences in the grades tested. In general, however, there is a sur- 
prising uniformity and the differences which appear consistently 
are the ones which have been recognized by the teachers and 
supervisors. School No. 2 is in the best section of the city of 
Champaign, Illinois, and School No. 5 draws partly from the 
poorest section of the city. The differences between the median 
scores for the grades of these two schools show a decided super- 
iority in both the Vocabulary and Virginia scales in favor of 
School No. 2 in every grade. If we examine School No. 9 
(Table XX XI), we find that the differences between rooms are 
very noticeable. The pupils in the six rooms in the eighth grade 
had been classified into sections according to their scholarship 
records. At the beginning of the school year their records in 
the seventh grade had been used as a rough basis for classifica- 
tion. When the year had progressed far enough to give exam- 
inations, the pupils had been tested very carefully, and a re- 
classification had been made on the basis of these examination 
records. As a consequence, Room No. 4 contained mainly the 
superior pupils, while Room No. 3 contained the poorest section. 
The classification of the pupils on the basis of scholarship re- 
sulted in some overlapping between the different sections. The 
teachers recognized this fact, since they rated some of the chil- 
dren in Room No. 3 as of average or above average ability in 
scholarship. The ranks of the different sections based on scores 
in the Vocabulary and the Virginia scales agree closely with the 
ranks previously given these sections by the school officials. 
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TABLE XXX—SCHOOL SCORES FOR EACH SCALE BY GRADES 


School and Scale 


School No. 1 


ae le eee a? 29 38 | 42 AT 
Accepts Ey dsctle wees | 569 %2 86 96 111 
Primer 50 | 62 72, | 
Hanh jh sea lente at 20 | 98 | 814° 435) 948 
Virginia essere Neen Al pee oe | i | 89 | 105 115 
Primer 48 5S oom} 
MER rhetteney Meneses || ceaebeetes|| ce tees gerne | 59 83 | 97 
School No. 3 | 
Vocabulary aap eatans gecicessameaen|) “e=asseabaeeuke | 30 36 388 | fh 
Virginia PS oe AN | as ee pes ‘ | 79 100 | 118 
Primer 40 54 65 | 
Otis Nise elk ite lreptestercec eee | eens 94 sl lif 
| | | 
School No. 4 
Vocabulary pifeegeceinde || aassescenerstae 14 32 34 42 
Virginia bees Paces etaras ether 44 | 63 81 92 
Primer 43 538 60 | 
School No. 5 | | la 
Vocabulary |isssvaxsnceakps ll linewnattta steven cageeteyseeeere | 28 30 38 | 44 
Vino in ic aan te ip elpcbeell eerac ere A) | 72 T4| © 88 F205 
Primer 41 OA O2 | 
Classification ria eeeeey | aaa ee Pe eee Sere [corse 58 81 | 100 
School No. 6 | | 
Vocabulaicyaeme meena rere erie 8 30 31 30 45 
Wahgeaunty: ee a Op tel eee ee gees | 65 Oe SO amt Og 
Primer 46 54) 66 
School No, 7 | 
Wocabular yas ss meee ee ee 17 | 28 | 
Waligeabowkey 9 = UL et 48 | 58 | 
Primer 37 55 OBB | 
School No. 8 | | 
Vocabulary raed |nae ees eral 24 
Virginia [eer reteset | 48 66 | 
Primer 41 62 70 | 


II. Quartile Variation of Children between Schools by 
Grades—The preceding section, which considered merely the 
median scores for the different rooms, gave a general picture of 
the situation in the city. A general picture, however, is not 
entirely adequate. Two fourth-grade classes may have identical 
median scores but, when the individual children are considered, 
there may be important differences. One may have children all 
of whom are of approximately the same mental ability, the 
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TABLE XXXI—SCORES IN SCHOOL NO. 9 FOR EACH SCALE BY ROOMS 


Grade and Scale 


Seventh Grade 
Vocabulary 
Virginia 


Eighth Grade 
Vocabulary 
Virginia 
Classification 
Otis 


other may have a number of dull children balanced by a corres- 
ponding number of bright children. Manifestly, it will be much 
easier to teach the first class than the second. No statistical pre- 
sentation can take the place of the detail furnished by the dis- 
tribution table. It can be used, however, to best advantage only 
when a few groups of children are under observation. Under 
the present circumstances some general expression of variation 
is needed—something that will give an average “scatter” just 
as the median gives an average score. We shall use the quartile 
deviation for this purpose. 

First we may regard the pupils of each grade as consti- 
tuting a single group. Table XXXII shows the median and 
quartile deviation (half the scale distance between the 25- and 
75-percentile) for each test and for each grade. 


Second, we may consider the deviation of each pupil from 
the median of his grade. Clearly if we wish to handle these 
deviations together we must express them in a common or at 
least a comparable unit. We propose to use the quartile devia- 
tions given in Table XXXII as such units. If, for example, a 
fourth-grade pupil scored 43 in the Vocabulary Scale, his devia- 
tion from the fourth-grade median for that test (31) would be 
+12. Since the corresponding quartile deviation is 6, his devia- 
tion in terms of the quartile deviation would be +2. Similarly 
if a sixth-grade pupil scored 76 in the Virginia Scale, his devia- 
tion in terms of the units of the scale would be -22. One might 
think that this deviation is greater than that of the fourth- 
grade pupil just mentioned. It is true that numerically and in 
terms of scale units it is nearly twice as great.. But the varia- 
bility of sixth-grade scores on the Virginia Seale is in general 
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greater, amounting, according to Table XXXII, to 11. In other 
words, the expectancy of deviation from the median is greater. 
This makes the larger numerical deviation of 22 of less signifi- 
cance. As before we may express the relation between the in- 
dividual deviation and the general measure of dispersion for the 
group by dividing the former by the latter. When this is done 
we find that, in terms of the quartile deviation for the grade and 
test, the individual deviation for the sixth-grade pupil whom we 
are considering would be —2. 


TABLE XXXII. MEDIANS AND QUARTILE DEVIATIONS 


Grade 
Scale an a tee RE Be 
I TE TIL EV he Vy VIS Vie vee 
Vocabulary | | 
Median eee ee) alecssee tl eee 21 | 31 | 33° |40 AT 53 
Quartile Deviation  _.......... | es posi 6 | 6 6 q 7 
Virginia | | 
Median a ha as 48> 1/68 .80 | D8. {CLO ary 
Quartile Deviation |......|/.2..—- TO ate Ort ltl) 12 Ait 
Classification | 
Median Nips es (ee roe cael Sea e 59 83 99 | 110 
Quartile sD eviationme aye leneos ees 12 10 =12 alah 
Otis ; | | H | 
Median Seige ere canal eerie SS dhe ip Ur re A 
Quartile Deviation |... |... A ar A he eT 10 16 13 
Picture Completion | | 
Median fee On OA SOG unos | | 
Quartile Deviation 4 4 3 2 
Primer | | 
Median | 44 | 57 | 64 
Quartile Deviation 9 Uf fe | | 


Table XX XIII gives the distribution of pupils of each 
grade according to the deviations of their scores from the 
median, deviations being expressed in terms of the quartile de- 
viation for the grade to which the pupil belonged and for the 
test which he took. If the same pupil took more than one test 
the average of his deviations is given. Thus each child is en- 
tered but once. The medians shown in Table XXXIII are the 
medians of deviations. 


These medians show in a general way the differences be- 
tween the performances of pupils in the several grades. The 
differences in the composition of each grade of the various 
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schools is made evident by the distributions. Some schools 
show a large number of below-average children, while others 
show the opposite condition. School No. 9, where the eighth 
grade was divided into sections based on scholarship achieve- 
ment, shows (Table XXXIV) the decided superiority of Room 
No. 4 and the marked inferiority of Room No. 3. The overlap- 
ping between these rooms indicates the insufficiency of the 
scholarship basis when used in the classification of children into 
sections. The quartile deviation provides a convenient device 
for showing the intelligence composition of rooms and grades. 


TABLE XXXIV. DISTRIBUTION OF PUPILS IN SCHOOL NO. 9 BY ROOMS 
AND ACCORDING TO THEIR DISPERSION FROM THE MEDIAN. 
THE UNIT IS THE QUARTILE DEVIATION 


| H r| 
Room | Room } Room ; Room! Room | Room 


“ 
Deviation ae 2 3 4 5 6 


| 


330 SAU) 
—1.0 to —0.1 
—2.0 to —1.1 
—3.0 to —2.1 
—4.0 to —3.1 


Total 
Median 


vif +1.0 to +1.9 


i ee aWOaR 


VIII +2.0 to +2.9 
+1.0 to +1.9 

0. to +-0:9 

—1.0 to —0.1 

—2.0 to —1.1 

—3.0 to —2.1 


Total 
Median 


NWoOwNwre 


+0.46 |+0.50 |—1.31 |+1.04 | —0,44 


—0.56 


III. Classification of Children on the Basis of the Intelli- 
gence Quotient. 


a. The quartile deviation of children takes into account 
only their present intelligence status—It is almost a common 
opinion that two people may be rated equal in intelligence, al- 
though they may be widely different in their ability to profit by 
their future experience. We may make this clear by a concrete 
illustration. Clarence S. and Frank V. are in the fourth grade 
and both secure the same scores when given intelligence tests. 


—_—— 


15° 


Clarence is 16 years old chronologically, while Frank is only 9. 
Manifestly, these two pupils have vastly different educational 
prospects. The older boy probably has reached his intellectual 
maturity and will soon drop out of school. He is a retarded 
pupil with an intelligence quotient of approximately 0.75. On 
the other hand, the younger child is of superior ability with an 
intelligence quotient of 1.10. His future educational prospects 
are bright. He will complete the elementary school; and if he 
enters high school and college, he will probably succeed. Clar- 
ence is the freight train that has started years before, while 
Frank is the express that has overtaken him in his educational 
journey. 


Before it is possible to convert the scores in the intelli- 
gence tests into mental age values, it becomes necessary to estab- 
lish standards for the different scales. In doing this it is neces- 
sary to make the commonly used assumption that the average 
individual reaches his intellectual maturity at the age of sixteen. 
Further, it is assumed that the highest intellectual develop- 
ment is represented by nineteen years’ mentality. With these 
hypothetical bases, it is a comparatively easy matter to set up 
age standards for the different scales. Although the data availa- 
ble were incomplete at the upper and lower ranges, it is felt that 
an adequate allowance was made for the selected nature of the 
groups from which the standards were obtained. The standards 
used in determining the intelligence quotients of the children are 
presented in Table XXXV for the different scales. Below the 
score for each age group are two figures in italics. With these 
figures it is possible to interpolate ages in years and months. 
These standards may be used as follows: If a child makes a 
score of 42 in the Vocabulary Scale, he has an approximate men- 
tal age of 13.3 years. If his score is 100 in the Virginia Scale, 
his mental age is 18.4 years. 


b. Distributions of children by intelligence quotients— 
The standards presented in Table XXXV made it possible to de- 
termine the mental age of each child as revealed by the scores 
that he made in the tests. Here again it was felt that the average 
of several figures is a better index than any one of them; hence 
the mental ages given by the various test scores were averaged 
for each of the children. This average mental age was divided 
by the chronological age, giving the intelligence quotient. In the 
high school where the children were older than 16 years chrono- 
logically, it was necessary to use 16 years as the divisor, since 
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TABLE XXXV. STANDARD SCORES FOR INDICATED AGES 


4 yrs. 
Seale to 5.0-| 6.0—| 7.0—]| 8.0—|9.0— |10.0—]11.0—|12.0-|18.0—; 14.0—]} 15.0-—|16.0— |17.0— |18.0— | 19.0— 
4.9 yrs. ee ae 
| i | 
Vocab- 5 15 22 29 36 43 50 57 65 74 82 90 
ulary 0-10 | 11-18} 19-25 | 26-32 | 33-39 | 40-46 | 47-53 | 54-60 | 61-69 | 70-78 | 79-86 | 87-93 
cae ne 15 36 538 | 65 76 87 99 115 129 145 
Virginia 0-25 | 26-44 | 45-59| 60-70 | 71-81 | 82-93 | 94-107) T08-123 | 124-137 | 138-155 
Classifi- 15 | 35 50 65 78 93 106 122 140 163 186 
cation 0-25| 26-42 | 43-57 | 58-71 | 72-85 | 86-99 | 100-IT4 | 115-132 | 133-152 | 153-174| 175-197 
Otis 40 65 85: | 102 118 135 148 162 176 190 
C 20-52 | 53-74 | 75-93 \94-108| 109-126 | 127-141 | 142-155 | 156-169 | 170-183 | 184-105 
: wen 2041) 321 48) 64 | 66 | 78 | 86 
Primer 0-12 | 13-26] 27-37| 38-48) 49-60 | 61-72) 73-83 | 84-89 


ta 


TABLE XXXVI. DISTRIBUTION OF INTELLIGENCE QUOTIENTS BY 
SCHOOLS AND GRADES 


School | School | School | School School | School | Schoo} Sehool 
fazed 1. Q. No.1 | No.2 1 No.8 | No.4 | No.5 | Nove ae Nog ace! 
I | 0.50—0.59 ea tae iGe. 3° 
0.60—0.69 9 0 1 3 
0.70—0.79 1 9 1 9 0 6 
0.80—0.89 3 2 4 0 0 0 9 
0.90—0.99 2 5 2 1 0 3 | 13 
1.00—1.09 4 tf 5 6 1 4 7 
1.10-1.19 8 3 6 a | 8 2 | 24 
1.20-1.29 5 1 3 5 1 1 | 16 
1.30—1.39 2 3 1 1 1 8 
1.40—-1.49 1 1 
Total 25 | 22 Sdn ih lee elie aeged 10 
Median 1.14 | 1.04 | 1.04 | 1.10 | 1.05 | 1.05 | 1.08 
Il | 0.50—0.59 | 2 2 
0.60—0.69 1 1 1 3 
0.70—0.79 i 0 1 mh 4 A 
0.80—0.89 0 2 3 1 3 2 11 
0.90—0.99 3 9 4 2 1 1 20 
1.00—1.09 Ble iOe ule 10 4 2 2 33 
1.10-1.19 12 6 3 3 5 3 32 
1.20—1.29 5 8 2 sept 19 
1.30-1.39 1 1 1 3 
1.40-1.49 1 1 
rete ae on ae oe, bad | 16. 8 || 128 
Median Sieata | Lor |) 1.04 11.00 | 1,11 | 1.08 1.07 
Tl | 0.60-0.69 1 1 1 1 4 
0.70-0.79 | 1 1 1 1 il ree ae tt 
0.30—0.89 | 3 1 1 6 D) 1 2 Bey ot 
0.90-0.99 | 5 6 6 9 6 2 4 7 | 45 
1.00-1.09} 6 8 Sa ih te 9 2 3 7 | 55 
(pete) 6m | 10 7 3 5 2 3 7. | 43 
1.20-1.29| 6 4 3 1 2 1 y) 1 | 20 
1.30=1.29 | <3 2 1 1 1 1 9 
1.40—1.49 2 | 82 
per a0, 84) 27) at) 27), 10° 1 20, | 28 | 210 
= — - ——— - =| - S| | 
Median (110 |1.11 | 1.06 | 1.00 | 1.04 | 1.05 | 0.98 | 1.01 | 1.04 
“Iv | 0.60—0.69 1 1 1 3 
YY e907) 4 2 2 2 9 {oa t.0 
).30—-0.89 | 2 4 5 5 6 4 3 | 26 
0.90-0.99 | 5 4 5 q 6 1 4 7 39 
1.00-1.09 | 6 8 7 8 5 3 2 Be aed 
Ot AOC 7 8 8 6 5 cal a 2 | 39 
1.20-1.29 | 5 8 Fy 4 iP 2 1 1 27 
1.30—1.39 | 2 2 4 2 1 2 | 13 
1 AO= 1A lo 2 4 { 7 
cise), ana 2 ’ 3 
RCE an Si aT 1 37 le 84,4 26 17. | 22 | 211 
Median 1 02 /etia7 | 4.09.1 1.04 | 0.97 | 1.08 10.93 | 0.99." 1.06" 
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TABLE XXXVI— (Continued) 


School | School School School | School | School | School | School 


Grade} 1 No fot No.2) 4 Noe | Nola’! Na&| Noe NaI4 Nasties 
V | 0.50—0.59 Tare: i 4 4 
0.60—0.69 “ll 1 1 5} al F 
0.70—0.79 2 0 33 3 T Z 17 
0.80—0.89 S 3 5 fi 8 3 29 
0.90—0.99 4 5 9 9 Ui 4 38 
1.00—1.09 inf 7 9 13 8 4 48 
1.10—1.19 i 9. 5 4h 6 5 389: 
1.20—1.29 4 ui 3 Ye 2 3 21 
1.80—1.39 2, 2 3 2; Za 2 13 
1.40—1.49 1 1 2 4 
Total 31 36 39 43 46 25 220 
Median TPOSM eee teOte tet Ole O.9 6 med Oe 1.038 
VI | 0.60—0.69 we af ih 1 | 5 
0.70—0.79 S il 2 6 al i a3 
0.80—0.89 4 2 9 8 4 2 29 
0.90—0.99 53 3 18 £2, 9 4 46 
1.00—1.09 ‘a 8 aml 1) 9 5 1] 57 
.1.10-—1.19 i 10 10 10 7 33 47 
1.20—1.29 4 Zi 5 4 4 1 25 
1.380—1.389 1 5 a 1 ay 9 
1.40—1.49 2 al 0 | | 3 
1.50—1.59 al i | ak 
Total 33) 38 53 60 34 17: 235 
Median WOAG |) DU OL OZ St O44 aR OL 1.04 
Vil ‘| School No. 9 
Room | Room 
1 2 

0.50—0.59 1 1 
0.60—0.69 ih 1 0 2 
0.70—0.79 1 al 2 pe 2 6 2 16 
0.80—0.89 5 5 af vf 72 6 4 380 
0.90—0.99 if 6 ES 9 2 4 9 42 
1.00—1.09 9 9 7 12 4 5 6 52 
1.10—1.19 4 ial 2 2 4 5 3 31 
1.20—1.29 5 4 2 1 il 3 4 20 
1.30—1.39 ul 2 3 
Total oS 38 19 33 16 30 28 197 
Median 12032)) 1.08 ete. 02 0.98 | 1.03 | 0.95 | 0.99 | 1.01 


it has been assumed that the average person reaches his mental 


maturity at this age. 


These intelligence quotients may be dis- 


tributed to show the composition of the different grades in each 


school. 


It might be added that these values are not so reliable 


for children who scored low as they are for those who scored 
high. A child may score low through other factors than the lack 
of intelligence. Consequently, there are somewhat more children 


——— en 
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TABLE XXXVII—DISTRIBUTION OF INTELLIGENCE QUOTIENTS IN THE 
EIGHTH GRADE OF SCHOOL NO. 9 BY ROOMS 


Room Number 


10; Total 
1 Be nee 4 5 6 
0.70—0.79 ie 1 3 
0.80—0.89 1 1 7 3 4 16 
0.90—0.99 4 3 10 2 7 10 36 
1/00—1.09 11 9 6 6 10 10 52 
110-149 a7 12 2 12 8 5 50 
1/20-1.29 4 5 1 9 3 2 24 
1.30-1.39 1 a Ure 2 1 7 
Total 32 33 28 31 31 33 188 
Median | 1.10 1.13 0.95 1.16 1.08 1.01 1.07 


TABLE XXXVIII—DISTRIBUTION OF INTELLIGENCE QUOTIENTS IN THE 
HIGH SCHOOL BY GRADES 


Grade = ie ma) 
I. Q. F Total 
IX x XI XII 
0.70—0.79 1 f 
0.80—0.89 4 3 3 0 10 
0.90—0.99 40 41 23 16 120 
100—-1.09 AA 40 48 33 165 
110-1.19 43 37 41 38 159 
1:20—-1.29 18 8 15 17 58 
1:30-1.39 8 4 2 1 15 
1:40-1.49 2 2 
Total | 159 133 132 106 530 
| ——_—_—— =f jen | 
Median et 08 24) 1.06 1.08 1.11 1.08 


showing an intelligence quotient equivalent to that of defective 
mentality than the facts in the case probably warrant. The ex- 
cess, however, is not thought to be large; because, with the ex- 
ception of the first two grades, the intelligence quotient is the 
combined value of the scores from several scales. As a rule, 
these scores varied but little with the very low-grade children, 
although the tests were given at different times, usually weeks 
apart. 

The intelligence quotient is extremely significant. Al- 
though it has not been definitely established, the opinion seems to 
be that it remains approximately constant through life. Oc- 
casional exceptions seem to appear, but it is probable that the 
rule holds as steadily as do most rules regarding mental or 
physical characteristics. On the basis of the size of the intelli- 
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gence quotient a number of classifications have been made. Ter- 
man’s is commonly accepted. It is given in Table XXXIX. 


TABLE XXXIX—-IMPLICATIONS OF INTELLIGENCE QUOTIENTS 


( TQ: Classification 


Above 140 “Near” genius or genius 

120 to 140 Very superior intelligence 

110 to. 120 Superior Intelligence ; 

90 to 110 Normal or average intelligence 


80 to 90 Dullmess, rarely classifiable as feeble- 
mindedness. 

70 to 80 Borderline deficiency, sometimes classi- 
fiable as dullness, often as feeble- 
mindedness 


Below 70 Definite feeblemindedness 


The intelligence quotients presented in Table XXXVI to 
XXXVIII inclusive may be summarized into a single tabulation. 
This is presented in Table XL. The median of this table raises 
an important question. Its value is 1.06 which implies one of 
two things: either that the standards established for the dif- 
ferent age groups have been set a little low or that Champaign 
represents a group slightly above the average in mentality. It 
is the opinion of the writer that the latter conclusion should be 
drawn. 

An examination of Table XL gives considerable material 
for speculation. In the light of the percents shown in this dis- 
tribution, it might be said that 1.9 percent of the school popula- 
tion in Champaign was definitely feeble-minded. Of course, a 
few pupils may have fallen into this group because they failed in 
the test through physical causes and not through lack of mental 
ability. But on the other hand, it might be added that there 
were several children in the schools who were too low in men- 
tality to take the tests. Due to the fact that the teachers did not 
always give the data for these children, several of them were not 
included in the study. Some of them are hopeless imbeciles and 
probably will never learn to read or write. 

If we adopt the standards of normality which are usually 
used, we shall find that only 46 percent of the school population 
is in the normal group—i.e., in the group which ranges from 
0.90 to 0.110. The next higher group, 1.10 to 1.19 contains 23 
percent. These are the two groups in which the greatest number 
of cases occur. Thus about seven out of every ten children in 
the Champaign schools appear to be either “normal” or “super- 
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lor,” using Terman’s classification. Below this large central 
group we have approximately 15 percent of the school popula- 
tion; above it, almost 17 percent. 


TABLE XL. INTELLIGENCE QUOTIENT DISTRIBUTIONS FOR THE 
ENTIRE SCHOOL POPULATION 
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In its school administration Champaign should make an 
attempt to provide for these two extreme groups. At present 
they are found in the regular classes. There are but few at- 
tempts to meet the special needs of these children. The problem 
is not so complicated as this table might lead one to believe. 
The poorest 15 percent are found almost entirely in the grades 
below the eighth. Moreover, they are much more numerous in 
some sections of the city than in others. This fact will make it 
possible to provide special classes for them in which the curricu- 
lum can be modified to meet their particular needs. This group 
probably should be provided with more of the vocational and 
less of the academic subjects. It is probable for example, that a 
course in dishwashing, sweeping, cleaning, and in other simple 
household duties would be very beneficial to the girls who have 
intelligence quotients below nine-tenths. An examination of 
Table XXXVI shows that there are a sufficient number of these 
children to provide classes of economical size from the adminis- 
trative point of view. 

The 17 percent who are above the large central group 
should also be specially provided for. These are the children 
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who could make rapid progress through the school, if given the 
opportunity. It is decidedly unsatisfactory to give these chil- 
dren the same bill of fare as is provided for the average and the 
mediocre. In instruction one “exposure” of most topics is suffi- 
cient for them. A single reading of their lesson suffices, where 
the average or mediocre child must read it several times. From 
every point of view it is wasteful to keep these children in the 
game classes with the other children. It is not altogether satis- 
factory to give them rapid promotion from grade to grade, and 
thus to allow them to skip parts of the work. Consequently, 
the only sensible thing to do is to provide special classes for 
them. — 

Special classes, especially those for subnormal children, 
will no doubt meet with some opposition on the part of the par- 
ents, if the classes are established too abruptly and without the 
utmost tact. The present classification of the children in the 
city, which permits the transfer of a child from one school to an- 
other for administrative reasons, also permits a grouping of the 
children which will secure these desired results without any 
special advertising of the fact. 

c. Age-grade intelligence-quotient groups—Regarding 
the child three facts not thus far combined in our tables are of 
special administrative importance. These are his chronological 
age, his grade in school, and his intelligence quotient. Manifest- 
ly, if a child is chronologically older than the normal age for the 
grade in which he is located and has an intelligence quotient that 
indicates approximate normality, the sensible thing to do is to 
promote him to the next higher grade, giving him extra atten- 
tion so that he may meet the deficiencies in his scholarship pre- 
paration which result from this unusual progress. Champaign 
would experience no difficulty from the double promotion of such 
children, because the city has provided “opportunity classes” 
which all children who are maladjusted may attend in order that 
they may receive extra attention and make up work. These 
“opportunity classes’ could provide for children who are now 
found to be out of place and do it without special effort. 

The three facts which we have mentioned—age, grade, 
and intelligence quotient—may be presented in tabular form, 
thus furnishing a convenient means for discovering how many 
children may be considered improperly placed. Table XLI gives 
the distributions of intelligence quotients at each age for the 
the sixth grade and may serve as an example of this type of rep- 
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resentation. An examination of this table shows that compara- 
tively few retarded children have an intelligence quotient high 
enough to warrant their promotion. Perhaps eleven of the 13 
year-olds and one of the 14 year-olds might be advanced. But 
when we consider the fact that the intelligence level of the 
Champaign system is 1.06 it may be questioned whether it is 


TABLE XLI. AGE-INTELLIGENCE-QUOTIENT DISTRIBUTION FOR 
SIXTH GRADE 
———————— Ss 


Age in Years 


10 | et 12 13 | 14 | 15 

0.50—0.59 
0.60—0.69 | ara id 2 
0.70—0.79 1 8 9 8 
0.80—0.89 1 10 17 5 2 
0.90—0.99 1 7 26 14 2 
1.00—1.09 20 26 8 1 
1.10-1.19 1 23 15 3 
1.20-1.29 i 11 1 
1.30—1.39 1 5 | 
1.40-1.49 | 1 | 
pseri59 | ot 

| By aos eee 
Toa. || (S 68 79 50 21 8 
Medianelyat2e isa 1:04 0.90 0.77 | 0.17 
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advisable to promote any of these children except the one who is 
14 years old and whose intelligence quotient is more than 1.00. 
In other words, there is comparatively little maladjustment from 
this point of view and certainly no extreme maladjustment. This 
table, which is typical of the other grades, emphasizes the fact 
that we shall always have retardation in our schools as long as 
children are grouped in classes where all must take the same 
curriculum without reference to their ability. It seems to the 
writer that the most sensible provision which can be made is to 
classify the children into at least three groups based on the in- 
telligence quotient and then to prepare courses of study suited 
to these groups. 

So far as ability to do higher-grade work is concerned it 
is not the older but the younger children who are really retarded. 
The reader will observe the consistently higher ranges of 
intelligence quotients among the younger children. The 15 
twelve-year-old children whose intelligence quotients are be- 
tween 1.10 and 1.19 have an average mental age of 13.8 years. 
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If they had received proper instruction and had been advanced 
according to their ability they would be in the seventh grade— 
some of them in the eighth. Similarly the eleven eleven-year- 


olds whose intelligence quotients are between 1.20 and 1.29 have . 


mental ages which would entitle them to belong to the seventh 
grade. The five children of the same age who have intelligence 
quotients between 1.30 and 1.39 would be in the eighth grade, 
if their advancement had kept pace with their mental develop- 
ment. Whether such children should suddenly be promoted to 
the grade to which by mental age they belong is a debatable 
question. Such a belated adjustment would be at best a make- 
shift compared to the gradual adjustment that would have been 
possible if these children could have been identified early in 
their school career. Meanwhile, however, it is proper to point 
out that when pupils have been boldly promoted in accordance 
with their mental ages, even after they have been “discovered” 
relatively late in their school course, they have usually main- 
‘tained themselves with credit in their advanced grades. 


C—SUMMARY AND COMMENTS 


I. Schools vary appreciably in the distribution of pupils 
-according to mentality. Some of the rooms in the same school 
-are much superior to others. These large differences might 
have been anticipated from the general opinions of the teachers 
and supervisors. 


II. Each child’s departure from the median of his grade, 
in terms of the quartile deviation, serves as a convenient means 
of comparing his performance with that of another child wheth- 
er the latter be in the same grade or not—or indeed whether he 
has taken the same test or not. This measure serves also to in- 
dicate the large irregularities in the classification of the children. 


III. The intelligence quotient is a better means of 
measuring the individual variability of children within a room 
because it emphasizes the educational possibilities of each child. 
It thus becomes the best device, from the standpoint of the 
teacher, for measuring the “brightness” of the children. 


IV. Grouping the children in Champaign on the basis 
of intelligence quotients, 69 percent of the school pupils may be 
considered of approximately uniform ability. Above this central 
homogeneous mass there are 17 percent who are above average 
and of high ability. Below it are 15 percent who are as inferior 
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as the others are superior. Special provisions with modified 
curriculums should be made for each of these two groups. 


V. An age-intelligence-quotient table for each grade 
furnishes the best device for analyzing a school situation. This 
representation reveals the situation at a glance. If there are any 
maladjustments among the older groups, the fact is self-evident 
when the data are tabulated in this form. When the pupils who 
are out of place have been identified, it is a comparatively simple 
matter to apply the necessary remedies. 
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APPENDIX 


‘I. Incomplete Pictures as Tests—The incomplete pic- 
ture has been used consistently as a mental test since Binet in- 
troduced it in his scale. When it became necessary to devise a 
test for illiterates and foreigners in the army, incomplete pic- 
tures were accepted as a promising form of test. In the Feta 
examination devised by the army workers this form of test was 
used with adults of varying degrees of mental maturity, al- 
though the value of the incomplete picture as a test was com- 
paratively unknown. Moreover, the data presented in Part II 
for the Virginia Delta I Scale raises the question of the suita- 
bility of the picture completion test for children beyond the 
primary grades because it shows almost no discriminative 
power for grades III to VIII. In the light of these facts it seems 
worth while to consider the value and possibilities of incomplete 
pictures as tests. Bearing upon this question are the results of a 
study of this form of test which has been made by the writer. 

The data secured show that very few purely incomplete 
pictures are difficult enough to test the intelligence of normal 
children who are more than nine years of age. Furthermore, 
normal children of the primary grades are sometimes very much 
puzzled by incomplete pictures, although they may show a rea- 
sonable degree of keeness in other respects. In other words, 
the ability to recognize the omissions is a more or less special- 
ized ability which depends to a certain extent on the type of ex- 
perience the individual has had. When the problem of investi- 
gating the incomplete picture was first attacked it was fondly 
hoped that a series of incomplete pictures might be found which 
would present a range of difficulty capable of testing pupils from 
the primary through the grammar grades. This hope was not 
realized and the writer is led to the conclusion that incomplete 
pictures are of little value as tests of intelligence above the 
primary grades. 


Thirty-eight incomplete pictures were submitted to ap- 
proximately one thousand four hundred children. Of the total 
number of pictures presented twenty were found to be suitable 
for use in a test for the primary grades. These are presented 
in a “Picture Completion Test”? published by the Bureau of 
Educational Research. Figure 1 shows this test. 
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FIGURE 1. PICTURE COMPLETION TEST BY DR. CHARLES E. HOLLEY 
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FIGURE 1—(Continued) 
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The difficulty in terms of the percent of correct responses 
was obtained for each of these twenty pictures. Table XLII 
shows these percents for the first three grades. 


Il. Sea Differences—The data gathered in Champaign 
from all the tests were examined from the standpoint of the 
difference between boys and girls. The results were such as to 
lead one to conclude that there are no real sex differences in 
general intelligence which may be revealed by these general 


TABLE XLII. PICTURE COMPLETION TEST. PERCENTS CORRECT 
FOR EACH PICTURE BY GRADES 


Number | 
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tests. The medians and averages for the two sexes were ap- 
proximately the same in the various tests, all differences being 
small enough to justify one in attributing them to chance fac- 
tors. 


Ill. Administration by Teachers—It is highly desirable 
to have intelligence scales that can be administered by the in- 
dividual teacher. If, however, several rooms are to be compared, 
it is better, in practice, for a supervisor to give the tests than for 
the individual teacher to do so. Although most teachers will do 
their best to follow instructions accurately and thus to secure 
uniform results, a small minority will persist in varying condi- 
tions to suit their own ends. From many points of view, it will 
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be better if the pupils take intelligence tests under the direction 
of supervisors or of persons other than the room teachers. The 
results will be comparable from room to room and they may be 
made the basis of administrative measures in a way that would 
not be the case if their reliability were in doubt. Since nearly 
every test requires practice for its successful administration, the 
supervisor who administers a test several times becomes prac- 
ticed in its details and thus secures the complete cooperation 
of the pupils. With the more difficult scales the supervisor can 
take the time to perfect his method by practicing on his friends 
before he administers the scale in the classroom where the re- 
sults are important. Individual teachers cannot spare the time 
and trouble needed to perfect their technique even if they are 
entirely in sympathy with the work. Consequently, it is much 
better if all tests of this nature are administered by supervisors. 


IV. Scoring—If mental tests are to be used in a way 
that will contribute most to school problems they must be scored 
very accurately. Where a child’s future is to be influenced by 
the result it is vitally important that his score be as nearly cor- 
rect as the test will permit. 


The best results are secured if the scoring is done by a 
few careful workers who have been trained for their duties in- 
stead of by a larger number of people who devote only a little 
time individually to the work. It requires much valuable time to 
instruct the scorers in the methods of evaluating the various 
parts of the pupil’s answers, and as a rule, several days of 
practice are needed before they can become proficient. Con- 
sequently, paid trained workers are a decided economy over vol- 
unteer workers. As far as possible nothing should be left to 
independent judgment. Instructions should be prepared which 
will cover every possible case. The system of indicating rights 
and wrongs on the tests should be worked out so carefully that 
it will economize time and eliminate chances for error. All of 
these details should be covered in the training of the clerical 
workers. 


It is possible to secure good clerical workers among the 
student body of the average high school and college. Scoring 
takes good eyesight and ability to learn quickly. People who 
score for a few days on a test acquire increased proficiency from 
day to day and reach their maximum in about a week. This fact 
suggests that it is uneconomical to employ a large number of 
workers for a small project. It is better to secure a few capable 
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workers and allow them to perform the same type of work for 
a longer time than would be needed for the larger group. They 
will become very skilled and will do the work at a much smaller 
unit cost. 


It is best to rescore or check nearly every operation that 
is involved in rating a set of papers. This checking should be 
done by persons other than those who scored the papers the first 
time. A second scoring will catch most of the errors. 


Stencils facilitate the work.of scoring by economizing 
eye movements and pencil marks. Some types of material lend 
themselves to the use of transparent stencils. These stencils 
may be made from the celluloid used as window material for 
automobile curtains. Transparent paper may also be used— 
especially when durability is not necessary. Ink dots or lines 
may be so placed on this material that they will coincide with 
the marks that the pupils must make in indicating the right 
answers to the tests. In other types of tests cardboard stencils 
may be made which will enable the scorers to check the answers 
quickly. Every device should be employed which will economize 
time and insure a high degree of accuracy. Any device which 
leads to simplicity removes sources of error. 


